2014-11-29 17:26:45

by Andy Lutomirski

[permalink] [raw]
Subject: [PATCH v2] userns: Disallow setgroups unless the gid_map writer is privileged

Classic unix permission checks have an interesting feature. The
group permissions for a file can be set to less than the other
permissions on a file. Occasionally this is used deliberately to
give a certain group of users fewer permissions than the default.

User namespaces break this usage. Groups set in rgid or egid are
unaffected because an unprivileged user namespace creator can only
map a single group, so setresgid inside and outside the namespace
have the same effect. However, an unprivileged user namespace
creator can currently use setgroups(2) to drop all supplementary
groups, so, if a supplementary group denies access to some resource,
user namespaces can be used to bypass that restriction.

To fix this issue, this introduces a new user namespace flag
USERNS_SETGROUPS_ALLOWED. If that flag is not set, then
setgroups(2) will fail regardless of the caller's capabilities.

USERNS_SETGROUPS_ALLOWED is cleared in a new user namespace. By
default, if the writer of gid_map has CAP_SETGID in the parent
userns and the parent userns has USERNS_SETGROUPS_ALLOWED, then the
USERNS_SETGROUPS_ALLOWED will be set in the child. If the writer is
not so privileged, then writing to gid_map will fail unless the
writer adds "setgroups deny" to gid_map, in which case the check is
skipped but USERNS_SETGROUPS_ALLOWED will remain cleared.

The full semantics are:

If "setgroups allow" is present or no explicit "setgroups" setting
is written to gid_map, then writing to gid_map will fail with -EPERM
unless the opener and writer have CAP_SETGID in the parent namespace
and the parent namespace has USERNS_SETGROUPS_ALLOWED.

If "setgroups deny" is present, then writing gid_map will work as
before, but USERNS_SETGROUPS_ALLOWED will remain cleared. This will
result in processes in the userns that have CAP_SETGID to be
nontheless unable to use setgroups(2). If this breaks something
inside the userns, then this is okay -- the userns creator
specifically requested this behavior.

While it could be safe to set USERNS_SETGROUPS_ALLOWED if the user
namespace creator has no supplementary groups, doing so could be
surprising and could have unpleasant interactions with setns(2).

Any application that uses newgidmap(1) should be unaffected by this
fix, but unprivileged users that create user namespaces to
manipulate mounts or sandbox themselves will break until they start
using "setgroups deny".

This should fix CVE-2014-8989.

Cc: [email protected]
Signed-off-by: Andy Lutomirski <[email protected]>
---

Unlike v1, this *will* break things like Sandstorm. Fixing them will be
easy. I agree that this will result in better long-term semantics, but
I'm not so happy about breaking working software.

If this is unpalatable, here's a different option: get rid of all these
permission checks and just change setgroups. Specifically, make it so
that setgroups(2) in a userns will succeed but will silently refuse to
remove unmapped groups.

Changes from v1:
- Userns flags are now properly atomic.
- "setgroups allow" is now the default, so legacy unprivileged gid_map
writers will start to fail.

include/linux/user_namespace.h | 3 +++
kernel/groups.c | 3 +++
kernel/user.c | 1 +
kernel/user_namespace.c | 42 ++++++++++++++++++++++++++++++++++++++++++
4 files changed, 49 insertions(+)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index e95372654f09..0ae4a8c97165 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -17,6 +17,8 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
} extent[UID_GID_MAP_MAX_EXTENTS];
};

+#define USERNS_SETGROUPS_ALLOWED 0
+
struct user_namespace {
struct uid_gid_map uid_map;
struct uid_gid_map gid_map;
@@ -27,6 +29,7 @@ struct user_namespace {
kuid_t owner;
kgid_t group;
unsigned int proc_inum;
+ unsigned long flags;

/* Register of per-UID persistent keyrings for this namespace */
#ifdef CONFIG_PERSISTENT_KEYRINGS
diff --git a/kernel/groups.c b/kernel/groups.c
index 451698f86cfa..b5ec42423202 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -6,6 +6,7 @@
#include <linux/slab.h>
#include <linux/security.h>
#include <linux/syscalls.h>
+#include <linux/user_namespace.h>
#include <asm/uaccess.h>

/* init to 2 - one for init_task, one to ensure it is never freed */
@@ -223,6 +224,8 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
struct group_info *group_info;
int retval;

+ if (!test_bit(USERNS_SETGROUPS_ALLOWED, &current_user_ns()->flags))
+ return -EPERM;
if (!ns_capable(current_user_ns(), CAP_SETGID))
return -EPERM;
if ((unsigned)gidsetsize > NGROUPS_MAX)
diff --git a/kernel/user.c b/kernel/user.c
index 4efa39350e44..58fba8ea0845 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -51,6 +51,7 @@ struct user_namespace init_user_ns = {
.owner = GLOBAL_ROOT_UID,
.group = GLOBAL_ROOT_GID,
.proc_inum = PROC_USER_INIT_INO,
+ .flags = (1 << USERNS_SETGROUPS_ALLOWED),
#ifdef CONFIG_PERSISTENT_KEYRINGS
.persistent_keyring_register_sem =
__RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem),
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index aa312b0dc3ec..1f63935483e9 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -601,6 +601,10 @@ static ssize_t map_write(struct file *file, const char __user *buf,
char *kbuf, *pos, *next_line;
ssize_t ret = -EINVAL;

+ bool may_setgroups = false;
+ bool setgroups_requested = true;
+ bool seen_explicit_setgroups = false;
+
/*
* The id_map_mutex serializes all writes to any given map.
*
@@ -633,6 +637,18 @@ static ssize_t map_write(struct file *file, const char __user *buf,
if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN))
goto out;

+ if (map == &ns->gid_map) {
+ /*
+ * Setgroups is permitted if the writer and the
+ * parent ns are privileged.
+ */
+ may_setgroups =
+ test_bit(USERNS_SETGROUPS_ALLOWED,
+ &ns->parent->flags) &&
+ file_ns_capable(file, ns->parent, CAP_SETGID) &&
+ ns_capable(ns->parent, CAP_SETGID);
+ }
+
/* Get a buffer */
ret = -ENOMEM;
page = __get_free_page(GFP_TEMPORARY);
@@ -667,6 +683,23 @@ static ssize_t map_write(struct file *file, const char __user *buf,
next_line = NULL;
}

+ /* Is this line a gid_map option? */
+ if (map == &ns->gid_map) {
+ if (!strcmp(pos, "setgroups deny")) {
+ if (seen_explicit_setgroups)
+ goto out;
+ seen_explicit_setgroups = true;
+ setgroups_requested = false;
+ continue;
+ } else if (!strcmp(pos, "setgroups allow")) {
+ if (seen_explicit_setgroups)
+ goto out;
+ seen_explicit_setgroups = true;
+ setgroups_requested = true;
+ continue;
+ }
+ }
+
pos = skip_spaces(pos);
extent->first = simple_strtoul(pos, &pos, 10);
if (!isspace(*pos))
@@ -741,6 +774,15 @@ static ssize_t map_write(struct file *file, const char __user *buf,
extent->lower_first = lower_first;
}

+ /* Validate and install setgroups permission. */
+ if (map == &ns->gid_map && setgroups_requested) {
+ if (!may_setgroups) {
+ ret = -EPERM;
+ goto out;
+ }
+ set_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags);
+ }
+
/* Install the map */
memcpy(map->extent, new_map.extent,
new_map.nr_extents*sizeof(new_map.extent[0]));
--
1.9.3


2014-12-02 12:12:12

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v2] userns: Disallow setgroups unless the gid_map writer is privileged

Andy Lutomirski <[email protected]> writes:

> Classic unix permission checks have an interesting feature. The
> group permissions for a file can be set to less than the other
> permissions on a file. Occasionally this is used deliberately to
> give a certain group of users fewer permissions than the default.
>
> User namespaces break this usage. Groups set in rgid or egid are
> unaffected because an unprivileged user namespace creator can only
> map a single group, so setresgid inside and outside the namespace
> have the same effect. However, an unprivileged user namespace
> creator can currently use setgroups(2) to drop all supplementary
> groups, so, if a supplementary group denies access to some resource,
> user namespaces can be used to bypass that restriction.
>
> To fix this issue, this introduces a new user namespace flag
> USERNS_SETGROUPS_ALLOWED. If that flag is not set, then
> setgroups(2) will fail regardless of the caller's capabilities.
>
> USERNS_SETGROUPS_ALLOWED is cleared in a new user namespace. By
> default, if the writer of gid_map has CAP_SETGID in the parent
> userns and the parent userns has USERNS_SETGROUPS_ALLOWED, then the
> USERNS_SETGROUPS_ALLOWED will be set in the child. If the writer is
> not so privileged, then writing to gid_map will fail unless the
> writer adds "setgroups deny" to gid_map, in which case the check is
> skipped but USERNS_SETGROUPS_ALLOWED will remain cleared.
>
> The full semantics are:
>
> If "setgroups allow" is present or no explicit "setgroups" setting
> is written to gid_map, then writing to gid_map will fail with -EPERM
> unless the opener and writer have CAP_SETGID in the parent namespace
> and the parent namespace has USERNS_SETGROUPS_ALLOWED.
>
> If "setgroups deny" is present, then writing gid_map will work as
> before, but USERNS_SETGROUPS_ALLOWED will remain cleared. This will
> result in processes in the userns that have CAP_SETGID to be
> nontheless unable to use setgroups(2). If this breaks something
> inside the userns, then this is okay -- the userns creator
> specifically requested this behavior.

I think we need to do this but I also think setgroups allow/deny
should be a separate knob than the uid/gid mapping.

If for no other reason than you missed at least two implementations of
setgroups, in your implementation.

> While it could be safe to set USERNS_SETGROUPS_ALLOWED if the user
> namespace creator has no supplementary groups, doing so could be
> surprising and could have unpleasant interactions with setns(2).
>
> Any application that uses newgidmap(1) should be unaffected by this
> fix, but unprivileged users that create user namespaces to
> manipulate mounts or sandbox themselves will break until they start
> using "setgroups deny".
>
> This should fix CVE-2014-8989.
>
> Cc: [email protected]
> Signed-off-by: Andy Lutomirski <[email protected]>
> ---
>
> Unlike v1, this *will* break things like Sandstorm. Fixing them will be
> easy. I agree that this will result in better long-term semantics, but
> I'm not so happy about breaking working software.

I know what you mean. One of the pieces of software broken by all of
this is my test to verify the remount semantics. Which makes all of
this very unfortunate.

> If this is unpalatable, here's a different option: get rid of all these
> permission checks and just change setgroups. Specifically, make it so
> that setgroups(2) in a userns will succeed but will silently refuse to
> remove unmapped groups.

Nope silently refusing to remove unmapped groups is not enough. I can
make any gid in my supplemental groups my egid, it takes a sgid helper
application but I don't need any special privileges to create that.
Once that group is my egid I can map it. Which means I could drop
any one group of my choosing without privielges. Which out and out
breaks negative groups :(

I got to looking and I have a significant piece of code that all of this
breaks.

tools/testing/selftests/mount/unprivileged-remount-test.c

So I am extra motivated to figure out at find a way to preserve most of
the existing functionality. My regression tests won't pass until I can
find something pallateable.

It is very annoying that every option I have considered so far breaks
something useful.

Having a write once setgroups disable, and the allowing unprivileged
mappings after that seems the most palatable option I have seen,
semantically. Which means existing software that doesn't care about
setgroups can just add the disable code and then work otherwise
unmodified.

The other option that I have played with is forcing a set of groups
in setgroups if your user namespace was created without privilege,
that winds up requiring that verify you don't have any other
supplementary groups, and is generally messy whichever way I look at it.

*Pounds head on desk*

What a mess.

Eric

> Changes from v1:
> - Userns flags are now properly atomic.
> - "setgroups allow" is now the default, so legacy unprivileged gid_map
> writers will start to fail.
>
> include/linux/user_namespace.h | 3 +++
> kernel/groups.c | 3 +++
> kernel/user.c | 1 +
> kernel/user_namespace.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 49 insertions(+)
>
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index e95372654f09..0ae4a8c97165 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -17,6 +17,8 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
> } extent[UID_GID_MAP_MAX_EXTENTS];
> };
>
> +#define USERNS_SETGROUPS_ALLOWED 0
> +
> struct user_namespace {
> struct uid_gid_map uid_map;
> struct uid_gid_map gid_map;
> @@ -27,6 +29,7 @@ struct user_namespace {
> kuid_t owner;
> kgid_t group;
> unsigned int proc_inum;
> + unsigned long flags;
>
> /* Register of per-UID persistent keyrings for this namespace */
> #ifdef CONFIG_PERSISTENT_KEYRINGS
> diff --git a/kernel/groups.c b/kernel/groups.c
> index 451698f86cfa..b5ec42423202 100644
> --- a/kernel/groups.c
> +++ b/kernel/groups.c
> @@ -6,6 +6,7 @@
> #include <linux/slab.h>
> #include <linux/security.h>
> #include <linux/syscalls.h>
> +#include <linux/user_namespace.h>
> #include <asm/uaccess.h>
>
> /* init to 2 - one for init_task, one to ensure it is never freed */
> @@ -223,6 +224,8 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
> struct group_info *group_info;
> int retval;
>
> + if (!test_bit(USERNS_SETGROUPS_ALLOWED, &current_user_ns()->flags))
> + return -EPERM;
> if (!ns_capable(current_user_ns(), CAP_SETGID))
> return -EPERM;
> if ((unsigned)gidsetsize > NGROUPS_MAX)
> diff --git a/kernel/user.c b/kernel/user.c
> index 4efa39350e44..58fba8ea0845 100644
> --- a/kernel/user.c
> +++ b/kernel/user.c
> @@ -51,6 +51,7 @@ struct user_namespace init_user_ns = {
> .owner = GLOBAL_ROOT_UID,
> .group = GLOBAL_ROOT_GID,
> .proc_inum = PROC_USER_INIT_INO,
> + .flags = (1 << USERNS_SETGROUPS_ALLOWED),
> #ifdef CONFIG_PERSISTENT_KEYRINGS
> .persistent_keyring_register_sem =
> __RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem),
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index aa312b0dc3ec..1f63935483e9 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -601,6 +601,10 @@ static ssize_t map_write(struct file *file, const char __user *buf,
> char *kbuf, *pos, *next_line;
> ssize_t ret = -EINVAL;
>
> + bool may_setgroups = false;
> + bool setgroups_requested = true;
> + bool seen_explicit_setgroups = false;
> +
> /*
> * The id_map_mutex serializes all writes to any given map.
> *
> @@ -633,6 +637,18 @@ static ssize_t map_write(struct file *file, const char __user *buf,
> if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN))
> goto out;
>
> + if (map == &ns->gid_map) {
> + /*
> + * Setgroups is permitted if the writer and the
> + * parent ns are privileged.
> + */
> + may_setgroups =
> + test_bit(USERNS_SETGROUPS_ALLOWED,
> + &ns->parent->flags) &&
> + file_ns_capable(file, ns->parent, CAP_SETGID) &&
> + ns_capable(ns->parent, CAP_SETGID);
> + }
> +
> /* Get a buffer */
> ret = -ENOMEM;
> page = __get_free_page(GFP_TEMPORARY);
> @@ -667,6 +683,23 @@ static ssize_t map_write(struct file *file, const char __user *buf,
> next_line = NULL;
> }
>
> + /* Is this line a gid_map option? */
> + if (map == &ns->gid_map) {
> + if (!strcmp(pos, "setgroups deny")) {
> + if (seen_explicit_setgroups)
> + goto out;
> + seen_explicit_setgroups = true;
> + setgroups_requested = false;
> + continue;
> + } else if (!strcmp(pos, "setgroups allow")) {
> + if (seen_explicit_setgroups)
> + goto out;
> + seen_explicit_setgroups = true;
> + setgroups_requested = true;
> + continue;
> + }
> + }
> +
> pos = skip_spaces(pos);
> extent->first = simple_strtoul(pos, &pos, 10);
> if (!isspace(*pos))
> @@ -741,6 +774,15 @@ static ssize_t map_write(struct file *file, const char __user *buf,
> extent->lower_first = lower_first;
> }
>
> + /* Validate and install setgroups permission. */
> + if (map == &ns->gid_map && setgroups_requested) {
> + if (!may_setgroups) {
> + ret = -EPERM;
> + goto out;
> + }
> + set_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags);
> + }
> +
> /* Install the map */
> memcpy(map->extent, new_map.extent,
> new_map.nr_extents*sizeof(new_map.extent[0]));

2014-12-02 18:54:15

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v2] userns: Disallow setgroups unless the gid_map writer is privileged

On Tue, Dec 2, 2014 at 4:09 AM, Eric W. Biederman <[email protected]> wrote:
> Andy Lutomirski <[email protected]> writes:
>
>> Classic unix permission checks have an interesting feature. The
>> group permissions for a file can be set to less than the other
>> permissions on a file. Occasionally this is used deliberately to
>> give a certain group of users fewer permissions than the default.
>>
>> User namespaces break this usage. Groups set in rgid or egid are
>> unaffected because an unprivileged user namespace creator can only
>> map a single group, so setresgid inside and outside the namespace
>> have the same effect. However, an unprivileged user namespace
>> creator can currently use setgroups(2) to drop all supplementary
>> groups, so, if a supplementary group denies access to some resource,
>> user namespaces can be used to bypass that restriction.
>>
>> To fix this issue, this introduces a new user namespace flag
>> USERNS_SETGROUPS_ALLOWED. If that flag is not set, then
>> setgroups(2) will fail regardless of the caller's capabilities.
>>
>> USERNS_SETGROUPS_ALLOWED is cleared in a new user namespace. By
>> default, if the writer of gid_map has CAP_SETGID in the parent
>> userns and the parent userns has USERNS_SETGROUPS_ALLOWED, then the
>> USERNS_SETGROUPS_ALLOWED will be set in the child. If the writer is
>> not so privileged, then writing to gid_map will fail unless the
>> writer adds "setgroups deny" to gid_map, in which case the check is
>> skipped but USERNS_SETGROUPS_ALLOWED will remain cleared.
>>
>> The full semantics are:
>>
>> If "setgroups allow" is present or no explicit "setgroups" setting
>> is written to gid_map, then writing to gid_map will fail with -EPERM
>> unless the opener and writer have CAP_SETGID in the parent namespace
>> and the parent namespace has USERNS_SETGROUPS_ALLOWED.
>>
>> If "setgroups deny" is present, then writing gid_map will work as
>> before, but USERNS_SETGROUPS_ALLOWED will remain cleared. This will
>> result in processes in the userns that have CAP_SETGID to be
>> nontheless unable to use setgroups(2). If this breaks something
>> inside the userns, then this is okay -- the userns creator
>> specifically requested this behavior.
>
> I think we need to do this but I also think setgroups allow/deny
> should be a separate knob than the uid/gid mapping.

Yeah. It should be readable, too.

>
> If for no other reason than you missed at least two implementations of
> setgroups, in your implementation.

I clearly didn't grep hard enough. Grr.

>
>> While it could be safe to set USERNS_SETGROUPS_ALLOWED if the user
>> namespace creator has no supplementary groups, doing so could be
>> surprising and could have unpleasant interactions with setns(2).
>>
>> Any application that uses newgidmap(1) should be unaffected by this
>> fix, but unprivileged users that create user namespaces to
>> manipulate mounts or sandbox themselves will break until they start
>> using "setgroups deny".
>>
>> This should fix CVE-2014-8989.
>>
>> Cc: [email protected]
>> Signed-off-by: Andy Lutomirski <[email protected]>
>> ---
>>
>> Unlike v1, this *will* break things like Sandstorm. Fixing them will be
>> easy. I agree that this will result in better long-term semantics, but
>> I'm not so happy about breaking working software.
>
> I know what you mean. One of the pieces of software broken by all of
> this is my test to verify the remount semantics. Which makes all of
> this very unfortunate.
>
>> If this is unpalatable, here's a different option: get rid of all these
>> permission checks and just change setgroups. Specifically, make it so
>> that setgroups(2) in a userns will succeed but will silently refuse to
>> remove unmapped groups.
>
> Nope silently refusing to remove unmapped groups is not enough. I can
> make any gid in my supplemental groups my egid, it takes a sgid helper
> application but I don't need any special privileges to create that.
> Once that group is my egid I can map it. Which means I could drop
> any one group of my choosing without privielges. Which out and out
> breaks negative groups :(

Whoops, right. And you can, indeed, have egid match one of your
supplementary groups.

>
> I got to looking and I have a significant piece of code that all of this
> breaks.
>
> tools/testing/selftests/mount/unprivileged-remount-test.c
>
> So I am extra motivated to figure out at find a way to preserve most of
> the existing functionality. My regression tests won't pass until I can
> find something pallateable.
>
> It is very annoying that every option I have considered so far breaks
> something useful.
>
> Having a write once setgroups disable, and the allowing unprivileged
> mappings after that seems the most palatable option I have seen,
> semantically. Which means existing software that doesn't care about
> setgroups can just add the disable code and then work otherwise
> unmodified.
>
> The other option that I have played with is forcing a set of groups
> in setgroups if your user namespace was created without privilege,
> that winds up requiring that verify you don't have any other
> supplementary groups, and is generally messy whichever way I look at it.

How bad would the automatic selection of setgroups behavior really be?

Suppose we have /proc/self/userns_setgroups_mode that can be "allow",
"deny", or "auto". It starts out as "auto" (or "deny" if it's set to
"deny" in the parent). Once any of the maps have been set,
userns_options becomes readonly. If you try to write to gid_map when
setgroups == auto, then it switches to "allow" or "deny" depending on
whether the writer has privilege.

This is nasty magical behavior, but it should DTRT for existing users,
and everyone can be updated to set the value explicitly.

FWIW, it might also make sense to move all of this stuff into
/proc/PID/userns. There may be races in which a setuid or otherwise
privileged helper pokes at more than one userns file but actually
modifies different namespaces each time. I don't know whether these
races matter. uid_map, gid_map, and projid_map could be symlinks.

--Andy

>
> *Pounds head on desk*
>
> What a mess.
>
> Eric
>
>> Changes from v1:
>> - Userns flags are now properly atomic.
>> - "setgroups allow" is now the default, so legacy unprivileged gid_map
>> writers will start to fail.
>>
>> include/linux/user_namespace.h | 3 +++
>> kernel/groups.c | 3 +++
>> kernel/user.c | 1 +
>> kernel/user_namespace.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>> 4 files changed, 49 insertions(+)
>>
>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>> index e95372654f09..0ae4a8c97165 100644
>> --- a/include/linux/user_namespace.h
>> +++ b/include/linux/user_namespace.h
>> @@ -17,6 +17,8 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
>> } extent[UID_GID_MAP_MAX_EXTENTS];
>> };
>>
>> +#define USERNS_SETGROUPS_ALLOWED 0
>> +
>> struct user_namespace {
>> struct uid_gid_map uid_map;
>> struct uid_gid_map gid_map;
>> @@ -27,6 +29,7 @@ struct user_namespace {
>> kuid_t owner;
>> kgid_t group;
>> unsigned int proc_inum;
>> + unsigned long flags;
>>
>> /* Register of per-UID persistent keyrings for this namespace */
>> #ifdef CONFIG_PERSISTENT_KEYRINGS
>> diff --git a/kernel/groups.c b/kernel/groups.c
>> index 451698f86cfa..b5ec42423202 100644
>> --- a/kernel/groups.c
>> +++ b/kernel/groups.c
>> @@ -6,6 +6,7 @@
>> #include <linux/slab.h>
>> #include <linux/security.h>
>> #include <linux/syscalls.h>
>> +#include <linux/user_namespace.h>
>> #include <asm/uaccess.h>
>>
>> /* init to 2 - one for init_task, one to ensure it is never freed */
>> @@ -223,6 +224,8 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
>> struct group_info *group_info;
>> int retval;
>>
>> + if (!test_bit(USERNS_SETGROUPS_ALLOWED, &current_user_ns()->flags))
>> + return -EPERM;
>> if (!ns_capable(current_user_ns(), CAP_SETGID))
>> return -EPERM;
>> if ((unsigned)gidsetsize > NGROUPS_MAX)
>> diff --git a/kernel/user.c b/kernel/user.c
>> index 4efa39350e44..58fba8ea0845 100644
>> --- a/kernel/user.c
>> +++ b/kernel/user.c
>> @@ -51,6 +51,7 @@ struct user_namespace init_user_ns = {
>> .owner = GLOBAL_ROOT_UID,
>> .group = GLOBAL_ROOT_GID,
>> .proc_inum = PROC_USER_INIT_INO,
>> + .flags = (1 << USERNS_SETGROUPS_ALLOWED),
>> #ifdef CONFIG_PERSISTENT_KEYRINGS
>> .persistent_keyring_register_sem =
>> __RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem),
>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> index aa312b0dc3ec..1f63935483e9 100644
>> --- a/kernel/user_namespace.c
>> +++ b/kernel/user_namespace.c
>> @@ -601,6 +601,10 @@ static ssize_t map_write(struct file *file, const char __user *buf,
>> char *kbuf, *pos, *next_line;
>> ssize_t ret = -EINVAL;
>>
>> + bool may_setgroups = false;
>> + bool setgroups_requested = true;
>> + bool seen_explicit_setgroups = false;
>> +
>> /*
>> * The id_map_mutex serializes all writes to any given map.
>> *
>> @@ -633,6 +637,18 @@ static ssize_t map_write(struct file *file, const char __user *buf,
>> if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN))
>> goto out;
>>
>> + if (map == &ns->gid_map) {
>> + /*
>> + * Setgroups is permitted if the writer and the
>> + * parent ns are privileged.
>> + */
>> + may_setgroups =
>> + test_bit(USERNS_SETGROUPS_ALLOWED,
>> + &ns->parent->flags) &&
>> + file_ns_capable(file, ns->parent, CAP_SETGID) &&
>> + ns_capable(ns->parent, CAP_SETGID);
>> + }
>> +
>> /* Get a buffer */
>> ret = -ENOMEM;
>> page = __get_free_page(GFP_TEMPORARY);
>> @@ -667,6 +683,23 @@ static ssize_t map_write(struct file *file, const char __user *buf,
>> next_line = NULL;
>> }
>>
>> + /* Is this line a gid_map option? */
>> + if (map == &ns->gid_map) {
>> + if (!strcmp(pos, "setgroups deny")) {
>> + if (seen_explicit_setgroups)
>> + goto out;
>> + seen_explicit_setgroups = true;
>> + setgroups_requested = false;
>> + continue;
>> + } else if (!strcmp(pos, "setgroups allow")) {
>> + if (seen_explicit_setgroups)
>> + goto out;
>> + seen_explicit_setgroups = true;
>> + setgroups_requested = true;
>> + continue;
>> + }
>> + }
>> +
>> pos = skip_spaces(pos);
>> extent->first = simple_strtoul(pos, &pos, 10);
>> if (!isspace(*pos))
>> @@ -741,6 +774,15 @@ static ssize_t map_write(struct file *file, const char __user *buf,
>> extent->lower_first = lower_first;
>> }
>>
>> + /* Validate and install setgroups permission. */
>> + if (map == &ns->gid_map && setgroups_requested) {
>> + if (!may_setgroups) {
>> + ret = -EPERM;
>> + goto out;
>> + }
>> + set_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags);
>> + }
>> +
>> /* Install the map */
>> memcpy(map->extent, new_map.extent,
>> new_map.nr_extents*sizeof(new_map.extent[0]));



--
Andy Lutomirski
AMA Capital Management, LLC

2014-12-02 19:47:59

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v2] userns: Disallow setgroups unless the gid_map writer is privileged

Andy Lutomirski <[email protected]> writes:

> On Tue, Dec 2, 2014 at 4:09 AM, Eric W. Biederman <[email protected]> wrote:
>> Andy Lutomirski <[email protected]> writes:
>>
>>> Classic unix permission checks have an interesting feature. The
>>> group permissions for a file can be set to less than the other
>>> permissions on a file. Occasionally this is used deliberately to
>>> give a certain group of users fewer permissions than the default.
>>>
>>> User namespaces break this usage. Groups set in rgid or egid are
>>> unaffected because an unprivileged user namespace creator can only
>>> map a single group, so setresgid inside and outside the namespace
>>> have the same effect. However, an unprivileged user namespace
>>> creator can currently use setgroups(2) to drop all supplementary
>>> groups, so, if a supplementary group denies access to some resource,
>>> user namespaces can be used to bypass that restriction.
>>>
>>> To fix this issue, this introduces a new user namespace flag
>>> USERNS_SETGROUPS_ALLOWED. If that flag is not set, then
>>> setgroups(2) will fail regardless of the caller's capabilities.
>>>
>>> USERNS_SETGROUPS_ALLOWED is cleared in a new user namespace. By
>>> default, if the writer of gid_map has CAP_SETGID in the parent
>>> userns and the parent userns has USERNS_SETGROUPS_ALLOWED, then the
>>> USERNS_SETGROUPS_ALLOWED will be set in the child. If the writer is
>>> not so privileged, then writing to gid_map will fail unless the
>>> writer adds "setgroups deny" to gid_map, in which case the check is
>>> skipped but USERNS_SETGROUPS_ALLOWED will remain cleared.
>>>
>>> The full semantics are:
>>>
>>> If "setgroups allow" is present or no explicit "setgroups" setting
>>> is written to gid_map, then writing to gid_map will fail with -EPERM
>>> unless the opener and writer have CAP_SETGID in the parent namespace
>>> and the parent namespace has USERNS_SETGROUPS_ALLOWED.
>>>
>>> If "setgroups deny" is present, then writing gid_map will work as
>>> before, but USERNS_SETGROUPS_ALLOWED will remain cleared. This will
>>> result in processes in the userns that have CAP_SETGID to be
>>> nontheless unable to use setgroups(2). If this breaks something
>>> inside the userns, then this is okay -- the userns creator
>>> specifically requested this behavior.
>>
>> I think we need to do this but I also think setgroups allow/deny
>> should be a separate knob than the uid/gid mapping.
>
> Yeah. It should be readable, too.
>
>>
>> If for no other reason than you missed at least two implementations of
>> setgroups, in your implementation.
>
> I clearly didn't grep hard enough. Grr.
>
>>
>>> While it could be safe to set USERNS_SETGROUPS_ALLOWED if the user
>>> namespace creator has no supplementary groups, doing so could be
>>> surprising and could have unpleasant interactions with setns(2).
>>>
>>> Any application that uses newgidmap(1) should be unaffected by this
>>> fix, but unprivileged users that create user namespaces to
>>> manipulate mounts or sandbox themselves will break until they start
>>> using "setgroups deny".
>>>
>>> This should fix CVE-2014-8989.
>>>
>>> Cc: [email protected]
>>> Signed-off-by: Andy Lutomirski <[email protected]>
>>> ---
>>>
>>> Unlike v1, this *will* break things like Sandstorm. Fixing them will be
>>> easy. I agree that this will result in better long-term semantics, but
>>> I'm not so happy about breaking working software.
>>
>> I know what you mean. One of the pieces of software broken by all of
>> this is my test to verify the remount semantics. Which makes all of
>> this very unfortunate.
>>
>>> If this is unpalatable, here's a different option: get rid of all these
>>> permission checks and just change setgroups. Specifically, make it so
>>> that setgroups(2) in a userns will succeed but will silently refuse to
>>> remove unmapped groups.
>>
>> Nope silently refusing to remove unmapped groups is not enough. I can
>> make any gid in my supplemental groups my egid, it takes a sgid helper
>> application but I don't need any special privileges to create that.
>> Once that group is my egid I can map it. Which means I could drop
>> any one group of my choosing without privielges. Which out and out
>> breaks negative groups :(
>
> Whoops, right. And you can, indeed, have egid match one of your
> supplementary groups.
>
>>
>> I got to looking and I have a significant piece of code that all of this
>> breaks.
>>
>> tools/testing/selftests/mount/unprivileged-remount-test.c
>>
>> So I am extra motivated to figure out at find a way to preserve most of
>> the existing functionality. My regression tests won't pass until I can
>> find something pallateable.
>>
>> It is very annoying that every option I have considered so far breaks
>> something useful.
>>
>> Having a write once setgroups disable, and the allowing unprivileged
>> mappings after that seems the most palatable option I have seen,
>> semantically. Which means existing software that doesn't care about
>> setgroups can just add the disable code and then work otherwise
>> unmodified.
>>
>> The other option that I have played with is forcing a set of groups
>> in setgroups if your user namespace was created without privilege,
>> that winds up requiring that verify you don't have any other
>> supplementary groups, and is generally messy whichever way I look at it.
>
> How bad would the automatic selection of setgroups behavior really be?
>
> Suppose we have /proc/self/userns_setgroups_mode that can be "allow",
> "deny", or "auto". It starts out as "auto" (or "deny" if it's set to
> "deny" in the parent). Once any of the maps have been set,
> userns_options becomes readonly. If you try to write to gid_map when
> setgroups == auto, then it switches to "allow" or "deny" depending on
> whether the writer has privilege.
>
> This is nasty magical behavior, but it should DTRT for existing users,
> and everyone can be updated to set the value explicitly.

Rarely is everything updated unless there is a requirement for an
update.

For my code that cares an update is necessary anyway as it contains
a gratuitous setgroups(0, NULL).

Since we have to break applications breaking them loud and clear and
letting them set the flat to recover (if possible) seems the best we can
do. That at least allows someone to ask if they depend on setgroups or
init_groups.

> FWIW, it might also make sense to move all of this stuff into
> /proc/PID/userns. There may be races in which a setuid or otherwise
> privileged helper pokes at more than one userns file but actually
> modifies different namespaces each time. I don't know whether these
> races matter. uid_map, gid_map, and projid_map could be symlinks.

I don't see how moving these files as removing any races.

Eric

2014-12-02 20:13:55

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v2] userns: Disallow setgroups unless the gid_map writer is privileged

On Tue, Dec 2, 2014 at 11:45 AM, Eric W. Biederman
<[email protected]> wrote:
> Andy Lutomirski <[email protected]> writes:
>
>> On Tue, Dec 2, 2014 at 4:09 AM, Eric W. Biederman <[email protected]> wrote:
>>> Andy Lutomirski <[email protected]> writes:
>>>
>>>> Classic unix permission checks have an interesting feature. The
>>>> group permissions for a file can be set to less than the other
>>>> permissions on a file. Occasionally this is used deliberately to
>>>> give a certain group of users fewer permissions than the default.
>>>>
>>>> User namespaces break this usage. Groups set in rgid or egid are
>>>> unaffected because an unprivileged user namespace creator can only
>>>> map a single group, so setresgid inside and outside the namespace
>>>> have the same effect. However, an unprivileged user namespace
>>>> creator can currently use setgroups(2) to drop all supplementary
>>>> groups, so, if a supplementary group denies access to some resource,
>>>> user namespaces can be used to bypass that restriction.
>>>>
>>>> To fix this issue, this introduces a new user namespace flag
>>>> USERNS_SETGROUPS_ALLOWED. If that flag is not set, then
>>>> setgroups(2) will fail regardless of the caller's capabilities.
>>>>
>>>> USERNS_SETGROUPS_ALLOWED is cleared in a new user namespace. By
>>>> default, if the writer of gid_map has CAP_SETGID in the parent
>>>> userns and the parent userns has USERNS_SETGROUPS_ALLOWED, then the
>>>> USERNS_SETGROUPS_ALLOWED will be set in the child. If the writer is
>>>> not so privileged, then writing to gid_map will fail unless the
>>>> writer adds "setgroups deny" to gid_map, in which case the check is
>>>> skipped but USERNS_SETGROUPS_ALLOWED will remain cleared.
>>>>
>>>> The full semantics are:
>>>>
>>>> If "setgroups allow" is present or no explicit "setgroups" setting
>>>> is written to gid_map, then writing to gid_map will fail with -EPERM
>>>> unless the opener and writer have CAP_SETGID in the parent namespace
>>>> and the parent namespace has USERNS_SETGROUPS_ALLOWED.
>>>>
>>>> If "setgroups deny" is present, then writing gid_map will work as
>>>> before, but USERNS_SETGROUPS_ALLOWED will remain cleared. This will
>>>> result in processes in the userns that have CAP_SETGID to be
>>>> nontheless unable to use setgroups(2). If this breaks something
>>>> inside the userns, then this is okay -- the userns creator
>>>> specifically requested this behavior.
>>>
>>> I think we need to do this but I also think setgroups allow/deny
>>> should be a separate knob than the uid/gid mapping.
>>
>> Yeah. It should be readable, too.
>>
>>>
>>> If for no other reason than you missed at least two implementations of
>>> setgroups, in your implementation.
>>
>> I clearly didn't grep hard enough. Grr.
>>
>>>
>>>> While it could be safe to set USERNS_SETGROUPS_ALLOWED if the user
>>>> namespace creator has no supplementary groups, doing so could be
>>>> surprising and could have unpleasant interactions with setns(2).
>>>>
>>>> Any application that uses newgidmap(1) should be unaffected by this
>>>> fix, but unprivileged users that create user namespaces to
>>>> manipulate mounts or sandbox themselves will break until they start
>>>> using "setgroups deny".
>>>>
>>>> This should fix CVE-2014-8989.
>>>>
>>>> Cc: [email protected]
>>>> Signed-off-by: Andy Lutomirski <[email protected]>
>>>> ---
>>>>
>>>> Unlike v1, this *will* break things like Sandstorm. Fixing them will be
>>>> easy. I agree that this will result in better long-term semantics, but
>>>> I'm not so happy about breaking working software.
>>>
>>> I know what you mean. One of the pieces of software broken by all of
>>> this is my test to verify the remount semantics. Which makes all of
>>> this very unfortunate.
>>>
>>>> If this is unpalatable, here's a different option: get rid of all these
>>>> permission checks and just change setgroups. Specifically, make it so
>>>> that setgroups(2) in a userns will succeed but will silently refuse to
>>>> remove unmapped groups.
>>>
>>> Nope silently refusing to remove unmapped groups is not enough. I can
>>> make any gid in my supplemental groups my egid, it takes a sgid helper
>>> application but I don't need any special privileges to create that.
>>> Once that group is my egid I can map it. Which means I could drop
>>> any one group of my choosing without privielges. Which out and out
>>> breaks negative groups :(
>>
>> Whoops, right. And you can, indeed, have egid match one of your
>> supplementary groups.
>>
>>>
>>> I got to looking and I have a significant piece of code that all of this
>>> breaks.
>>>
>>> tools/testing/selftests/mount/unprivileged-remount-test.c
>>>
>>> So I am extra motivated to figure out at find a way to preserve most of
>>> the existing functionality. My regression tests won't pass until I can
>>> find something pallateable.
>>>
>>> It is very annoying that every option I have considered so far breaks
>>> something useful.
>>>
>>> Having a write once setgroups disable, and the allowing unprivileged
>>> mappings after that seems the most palatable option I have seen,
>>> semantically. Which means existing software that doesn't care about
>>> setgroups can just add the disable code and then work otherwise
>>> unmodified.
>>>
>>> The other option that I have played with is forcing a set of groups
>>> in setgroups if your user namespace was created without privilege,
>>> that winds up requiring that verify you don't have any other
>>> supplementary groups, and is generally messy whichever way I look at it.
>>
>> How bad would the automatic selection of setgroups behavior really be?
>>
>> Suppose we have /proc/self/userns_setgroups_mode that can be "allow",
>> "deny", or "auto". It starts out as "auto" (or "deny" if it's set to
>> "deny" in the parent). Once any of the maps have been set,
>> userns_options becomes readonly. If you try to write to gid_map when
>> setgroups == auto, then it switches to "allow" or "deny" depending on
>> whether the writer has privilege.
>>
>> This is nasty magical behavior, but it should DTRT for existing users,
>> and everyone can be updated to set the value explicitly.
>
> Rarely is everything updated unless there is a requirement for an
> update.
>
> For my code that cares an update is necessary anyway as it contains
> a gratuitous setgroups(0, NULL).
>
> Since we have to break applications breaking them loud and clear and
> letting them set the flat to recover (if possible) seems the best we can
> do. That at least allows someone to ask if they depend on setgroups or
> init_groups.

Fair enough.

Any thoughts on what the API should be for v3?

>
>> FWIW, it might also make sense to move all of this stuff into
>> /proc/PID/userns. There may be races in which a setuid or otherwise
>> privileged helper pokes at more than one userns file but actually
>> modifies different namespaces each time. I don't know whether these
>> races matter. uid_map, gid_map, and projid_map could be symlinks.
>
> I don't see how moving these files as removing any races.

It helps if you use openat to open the userns directory and of the
/proc infrastructure is smart enough to make that work.

Admittedly, I don't actually see a dangerous race right now.

--Andy

>
> Eric
>



--
Andy Lutomirski
AMA Capital Management, LLC

2014-12-02 20:27:53

by Eric W. Biederman

[permalink] [raw]
Subject: [CFT][PATCH 1/3] userns: Avoid problems with negative groups


Classic unix permission checks have an interesting feature, the group
permissions for a file can be set to less than the other permissions
on a file. Occassionally this is used deliberately to give a certain
group of users fewer permissions than the default.

Overlooking negative groups has resulted in the permission checks for
setting up a group mapping in a user namespace to be too lax. Tighten
the permission checks in new_idmap_permitted to ensure that mapping
uids and gids into user namespaces without privilege will not result
in new combinations of credentials being available to the users.

When setting mappings without privilege only the creator of the user
namespace is interesting as all other users that have CAP_SETUID over
the user namespace will also have CAP_SETUID over the user namespaces
parent. So the scope of the unprivileged check is reduced to just
the case where cred->euid is the namespace creator.

For setting a uid mapping without privilege only euid is considered as
setresuid can set uid, suid and fsuid from euid without privielege
making any combination of uids possible with user namespaces already
possible without them.

For now seeting a gid mapping without privilege is removed. The only
possible set of credentials that would be safe without a gid mapping
(egid without any supplementary groups) just doesn't happen in practice
so would simply lead to unused untested code.

setgroups is modified to fail not only when the group ids do not
map but also when there are no gid mappings at all, preventing
setgroups(0, NULL) from succeeding when gid mappings have not been
established.

For a small class of applications this change breaks userspace
and removes useful functionality. This small class of applications
includes tools/testing/selftests/mount/unprivileged-remount-test.c

Most of the removed functionality will be added back with the
addition of a one way knob to disable setgroups. Once setgroups
is disabled setting the gid_map becomes as safe as setting the uid_map.

For more common applications that set the uid_map and the gid_map with
privilege this change will have no effect on them.

This should fix CVE-2014-8989.

Cc: [email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
arch/s390/kernel/compat_linux.c | 5 ++++-
include/linux/user_namespace.h | 10 ++++++++++
kernel/groups.c | 5 ++++-
kernel/uid16.c | 5 ++++-
kernel/user_namespace.c | 17 ++++++++++-------
5 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
index ca38139423ae..21c91feeca2d 100644
--- a/arch/s390/kernel/compat_linux.c
+++ b/arch/s390/kernel/compat_linux.c
@@ -49,6 +49,7 @@
#include <linux/fadvise.h>
#include <linux/ipc.h>
#include <linux/slab.h>
+#include <linux/user_namespace.h>

#include <asm/types.h>
#include <asm/uaccess.h>
@@ -246,10 +247,12 @@ out:

COMPAT_SYSCALL_DEFINE2(s390_setgroups16, int, gidsetsize, u16 __user *, grouplist)
{
+ struct user_namespace *user_ns = current_user_ns();
struct group_info *group_info;
int retval;

- if (!capable(CAP_SETGID))
+ if (!gid_mapping_possible(user_ns) ||
+ !capable(CAP_SETGID))
return -EPERM;
if ((unsigned)gidsetsize > NGROUPS_MAX)
return -EINVAL;
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index e95372654f09..26d5e8f5db97 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -46,6 +46,11 @@ static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
return ns;
}

+static inline bool gid_mapping_possible(const struct user_namespace *ns)
+{
+ return ns->gid_map.nr_extents != 0;
+}
+
extern int create_user_ns(struct cred *new);
extern int unshare_userns(unsigned long unshare_flags, struct cred **new_cred);
extern void free_user_ns(struct user_namespace *ns);
@@ -70,6 +75,11 @@ static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
return &init_user_ns;
}

+static inline bool gid_mapping_possible(const struct user_namespace *ns)
+{
+ return true;
+}
+
static inline int create_user_ns(struct cred *new)
{
return -EINVAL;
diff --git a/kernel/groups.c b/kernel/groups.c
index 451698f86cfa..b9a6a5c7e100 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -6,6 +6,7 @@
#include <linux/slab.h>
#include <linux/security.h>
#include <linux/syscalls.h>
+#include <linux/user_namespace.h>
#include <asm/uaccess.h>

/* init to 2 - one for init_task, one to ensure it is never freed */
@@ -220,10 +221,12 @@ out:

SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
{
+ struct user_namespace *user_ns = current_user_ns();
struct group_info *group_info;
int retval;

- if (!ns_capable(current_user_ns(), CAP_SETGID))
+ if (!gid_mapping_possible(user_ns) ||
+ !ns_capable(user_ns, CAP_SETGID))
return -EPERM;
if ((unsigned)gidsetsize > NGROUPS_MAX)
return -EINVAL;
diff --git a/kernel/uid16.c b/kernel/uid16.c
index 602e5bbbceff..602c7de2aa11 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -13,6 +13,7 @@
#include <linux/highuid.h>
#include <linux/security.h>
#include <linux/syscalls.h>
+#include <linux/user_namespace.h>

#include <asm/uaccess.h>

@@ -173,10 +174,12 @@ out:

SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
{
+ struct user_namespace *user_ns = current_user_ns();
struct group_info *group_info;
int retval;

- if (!ns_capable(current_user_ns(), CAP_SETGID))
+ if (!gid_mapping_possible(user_ns) ||
+ !ns_capable(user_ns, CAP_SETGID))
return -EPERM;
if ((unsigned)gidsetsize > NGROUPS_MAX)
return -EINVAL;
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index aa312b0dc3ec..51d65b444951 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -812,16 +812,19 @@ static bool new_idmap_permitted(const struct file *file,
struct user_namespace *ns, int cap_setid,
struct uid_gid_map *new_map)
{
- /* Allow mapping to your own filesystem ids */
- if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
+ const struct cred *cred = file->f_cred;
+
+ /* Allow a mapping without capabilities when allowing the root
+ * of the user namespace capabilities restricted to that id
+ * will not change the set of credentials available to that
+ * user.
+ */
+ if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) &&
+ uid_eq(ns->owner, cred->euid)) {
u32 id = new_map->extent[0].lower_first;
if (cap_setid == CAP_SETUID) {
kuid_t uid = make_kuid(ns->parent, id);
- if (uid_eq(uid, file->f_cred->fsuid))
- return true;
- } else if (cap_setid == CAP_SETGID) {
- kgid_t gid = make_kgid(ns->parent, id);
- if (gid_eq(gid, file->f_cred->fsgid))
+ if (uid_eq(uid, cred->euid))
return true;
}
}
--
1.9.1

2014-12-02 20:30:34

by Eric W. Biederman

[permalink] [raw]
Subject: [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis


- Expose the knob to user space through a proc file /proc/<pid>/setgroups

A value of 0 means the setgroups system call is disabled in the
current processes user namespace and can not be enabled in the
future in this user namespace.

A value of 1 means the segtoups system call is enabled.

- Descedent user namespaces inherit the value of setgroups from
their parents.

- A proc file is used (instead of a sysctl) as sysctls
currently do not pass in a struct file so file_ns_capable
is unusable.

- Update new_idmap_permitted to allow unprivileged users to
establish a mapping of their own gid, as such mappings
can no longer allow dropping of supplemental groups.

This set of changes restores as much as possible the functionality
that was lost when new_idmap_permitted was modified to not allow
mappinges to be established without privilege.

As this fixes a regression from: "userns: Avoid problems with negative groups"
it is probably a candidate for a backport.

Cc: [email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
---

This patch still needs a little bit of love.
I need to take a hard look at the interaction of barriers and atomic ops,
and it seems I have at least one comment fix that needs to move elsewhere.

But this should be enough to move the conversation forward.

arch/s390/kernel/compat_linux.c | 1 +
fs/proc/base.c | 31 ++++++++++----
include/linux/user_namespace.h | 3 ++
kernel/groups.c | 1 +
kernel/uid16.c | 1 +
kernel/user.c | 1 +
kernel/user_namespace.c | 95 ++++++++++++++++++++++++++++++++++++++++-
7 files changed, 124 insertions(+), 9 deletions(-)

diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
index 21c91feeca2d..6d0ee1b089fb 100644
--- a/arch/s390/kernel/compat_linux.c
+++ b/arch/s390/kernel/compat_linux.c
@@ -252,6 +252,7 @@ COMPAT_SYSCALL_DEFINE2(s390_setgroups16, int, gidsetsize, u16 __user *, grouplis
int retval;

if (!gid_mapping_possible(user_ns) ||
+ !atomic_read(&user_ns->setgroups_allowed) ||
!capable(CAP_SETGID))
return -EPERM;
if ((unsigned)gidsetsize > NGROUPS_MAX)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 772efa45a452..4ebed9f01d97 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2386,7 +2386,7 @@ static int proc_tgid_io_accounting(struct seq_file *m, struct pid_namespace *ns,
#endif /* CONFIG_TASK_IO_ACCOUNTING */

#ifdef CONFIG_USER_NS
-static int proc_id_map_open(struct inode *inode, struct file *file,
+static int proc_userns_open(struct inode *inode, struct file *file,
const struct seq_operations *seq_ops)
{
struct user_namespace *ns = NULL;
@@ -2418,7 +2418,7 @@ err:
return ret;
}

-static int proc_id_map_release(struct inode *inode, struct file *file)
+static int proc_userns_release(struct inode *inode, struct file *file)
{
struct seq_file *seq = file->private_data;
struct user_namespace *ns = seq->private;
@@ -2428,17 +2428,17 @@ static int proc_id_map_release(struct inode *inode, struct file *file)

static int proc_uid_map_open(struct inode *inode, struct file *file)
{
- return proc_id_map_open(inode, file, &proc_uid_seq_operations);
+ return proc_userns_open(inode, file, &proc_uid_seq_operations);
}

static int proc_gid_map_open(struct inode *inode, struct file *file)
{
- return proc_id_map_open(inode, file, &proc_gid_seq_operations);
+ return proc_userns_open(inode, file, &proc_gid_seq_operations);
}

static int proc_projid_map_open(struct inode *inode, struct file *file)
{
- return proc_id_map_open(inode, file, &proc_projid_seq_operations);
+ return proc_userns_open(inode, file, &proc_projid_seq_operations);
}

static const struct file_operations proc_uid_map_operations = {
@@ -2446,7 +2446,7 @@ static const struct file_operations proc_uid_map_operations = {
.write = proc_uid_map_write,
.read = seq_read,
.llseek = seq_lseek,
- .release = proc_id_map_release,
+ .release = proc_userns_release,
};

static const struct file_operations proc_gid_map_operations = {
@@ -2454,7 +2454,7 @@ static const struct file_operations proc_gid_map_operations = {
.write = proc_gid_map_write,
.read = seq_read,
.llseek = seq_lseek,
- .release = proc_id_map_release,
+ .release = proc_userns_release,
};

static const struct file_operations proc_projid_map_operations = {
@@ -2462,7 +2462,20 @@ static const struct file_operations proc_projid_map_operations = {
.write = proc_projid_map_write,
.read = seq_read,
.llseek = seq_lseek,
- .release = proc_id_map_release,
+ .release = proc_userns_release,
+};
+
+static int proc_setgroups_open(struct inode *inode, struct file *file)
+{
+ return proc_userns_open(inode, file, &proc_setgroups_seq_operations);
+}
+
+static const struct file_operations proc_setgroups_operations = {
+ .open = proc_setgroups_open,
+ .write = proc_setgroups_write,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = proc_userns_release,
};
#endif /* CONFIG_USER_NS */

@@ -2572,6 +2585,7 @@ static const struct pid_entry tgid_base_stuff[] = {
REG("uid_map", S_IRUGO|S_IWUSR, proc_uid_map_operations),
REG("gid_map", S_IRUGO|S_IWUSR, proc_gid_map_operations),
REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations),
+ REG("setgroups", S_IRUGO|S_IWUSR, proc_setgroups_operations),
#endif
#ifdef CONFIG_CHECKPOINT_RESTORE
REG("timers", S_IRUGO, proc_timers_operations),
@@ -2913,6 +2927,7 @@ static const struct pid_entry tid_base_stuff[] = {
REG("uid_map", S_IRUGO|S_IWUSR, proc_uid_map_operations),
REG("gid_map", S_IRUGO|S_IWUSR, proc_gid_map_operations),
REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations),
+ REG("setgroups", S_IRUGO|S_IWUSR, proc_setgroups_operations),
#endif
};

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 26d5e8f5db97..1e8cb168b1d0 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -27,6 +27,7 @@ struct user_namespace {
kuid_t owner;
kgid_t group;
unsigned int proc_inum;
+ atomic_t setgroups_allowed;

/* Register of per-UID persistent keyrings for this namespace */
#ifdef CONFIG_PERSISTENT_KEYRINGS
@@ -65,9 +66,11 @@ struct seq_operations;
extern const struct seq_operations proc_uid_seq_operations;
extern const struct seq_operations proc_gid_seq_operations;
extern const struct seq_operations proc_projid_seq_operations;
+extern const struct seq_operations proc_setgroups_seq_operations;
extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *);
extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *);
extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *);
+extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
#else

static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
diff --git a/kernel/groups.c b/kernel/groups.c
index b9a6a5c7e100..467ae954e859 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -226,6 +226,7 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
int retval;

if (!gid_mapping_possible(user_ns) ||
+ !atomic_read(&user_ns->setgroups_allowed) ||
!ns_capable(user_ns, CAP_SETGID))
return -EPERM;
if ((unsigned)gidsetsize > NGROUPS_MAX)
diff --git a/kernel/uid16.c b/kernel/uid16.c
index 602c7de2aa11..096962fa1975 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -179,6 +179,7 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
int retval;

if (!gid_mapping_possible(user_ns) ||
+ !atomic_read(&user_ns->setgroups_allowed) ||
!ns_capable(user_ns, CAP_SETGID))
return -EPERM;
if ((unsigned)gidsetsize > NGROUPS_MAX)
diff --git a/kernel/user.c b/kernel/user.c
index 4efa39350e44..0d78759f7dbe 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -51,6 +51,7 @@ struct user_namespace init_user_ns = {
.owner = GLOBAL_ROOT_UID,
.group = GLOBAL_ROOT_GID,
.proc_inum = PROC_USER_INIT_INO,
+ .setgroups_allowed = ATOMIC_INIT(1),
#ifdef CONFIG_PERSISTENT_KEYRINGS
.persistent_keyring_register_sem =
__RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem),
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 51d65b444951..521c8d53ee17 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -98,6 +98,7 @@ int create_user_ns(struct cred *new)
ns->level = parent_ns->level + 1;
ns->owner = owner;
ns->group = group;
+ atomic_set(&ns->setgroups_allowed, atomic_read(&parent_ns->setgroups_allowed));

set_cred_user_ns(new, ns);

@@ -640,7 +641,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
if (!page)
goto out;

- /* Only allow <= page size writes at the beginning of the file */
+ /* Only allow < page size writes at the beginning of the file */
ret = -EINVAL;
if ((*ppos != 0) || (count >= PAGE_SIZE))
goto out;
@@ -826,6 +827,11 @@ static bool new_idmap_permitted(const struct file *file,
kuid_t uid = make_kuid(ns->parent, id);
if (uid_eq(uid, cred->euid))
return true;
+ } else if (cap_setid == CAP_SETGID) {
+ kgid_t gid = make_kgid(ns->parent, id);
+ if (!atomic_read(&ns->setgroups_allowed) &&
+ gid_eq(gid, cred->egid))
+ return true;
}
}

@@ -844,6 +850,93 @@ static bool new_idmap_permitted(const struct file *file,
return false;
}

+static void *setgroups_m_start(struct seq_file *seq, loff_t *ppos)
+{
+ struct user_namespace *ns = seq->private;
+
+ return (*ppos == 0) ? ns : NULL;
+}
+
+static void *setgroups_m_next(struct seq_file *seq, void *v, loff_t *ppos)
+{
+ ++*ppos;
+ return NULL;
+}
+
+static void setgroups_m_stop(struct seq_file *seq, void *v)
+{
+}
+
+static int setgroups_m_show(struct seq_file *seq, void *v)
+{
+ struct user_namespace *ns = seq->private;
+
+ seq_printf(seq, "%u\n", atomic_read(&ns->setgroups_allowed));
+ return 0;
+}
+
+const struct seq_operations proc_setgroups_seq_operations = {
+ .start = setgroups_m_start,
+ .stop = setgroups_m_stop,
+ .next = setgroups_m_next,
+ .show = setgroups_m_show,
+};
+
+ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct seq_file *seq = file->private_data;
+ struct user_namespace *ns = seq->private;
+ char kbuf[3];
+ int setgroups_allowed;
+ ssize_t ret;
+
+ ret = -EPERM;
+ if (!file_ns_capable(file, ns, CAP_SETGID))
+ goto out;
+
+ /* Only allow a very narrow range of strings to be written */
+ ret = -EINVAL;
+ if ((*ppos != 0) || (count >= sizeof(kbuf)) || (count < 1))
+ goto out;
+
+ /* What was written? */
+ ret = -EFAULT;
+ if (copy_from_user(kbuf, buf, count))
+ goto out;
+ kbuf[count] = '\0';
+
+ /* What is being requested? */
+ ret = -EINVAL;
+ if (kbuf[0] == '0')
+ setgroups_allowed = 0;
+ else if (kbuf[0] == '1')
+ setgroups_allowed = 1;
+ else
+ goto out;
+
+ /* Allow a trailing newline */
+ ret = -EINVAL;
+ if ((count == 2) && (kbuf[1] != '\n'))
+ goto out;
+
+
+ if (setgroups_allowed) {
+ ret = -EINVAL;
+ if (atomic_read(&ns->setgroups_allowed) == 0)
+ goto out;
+ } else {
+ atomic_set(&ns->setgroups_allowed, 0);
+ /* sigh memory barriers! */
+ }
+
+ /* Report a successful write */
+ *ppos = count;
+ ret = count;
+out:
+ return ret;
+}
+
static void *userns_get(struct task_struct *task)
{
struct user_namespace *user_ns;
--
1.9.1

2014-12-02 20:32:22

by Eric W. Biederman

[permalink] [raw]
Subject: [CFT][PATCH 3/3] userns: Unbreak the unprivileged remount tests


A security fix in caused the way the unprivileged remount tests were
using user namespaces to break. Tweak the way user namespaces are
being used so the test works again.

Cc: [email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
---

This is what it takes to fix a broken application, in it's full glory.
This fix works even if new functionality does not exist.

tools/testing/selftests/mount/unprivileged-remount-test.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/mount/unprivileged-remount-test.c b/tools/testing/selftests/mount/unprivileged-remount-test.c
index 9669d375625a..d47227494137 100644
--- a/tools/testing/selftests/mount/unprivileged-remount-test.c
+++ b/tools/testing/selftests/mount/unprivileged-remount-test.c
@@ -144,13 +144,12 @@ static void create_and_enter_userns(void)
strerror(errno));
}

+ if (access("/proc/self/setgroups", F_OK) == 0) {
+ write_file("/proc/self/setgroups", "0");
+ }
write_file("/proc/self/uid_map", "0 %d 1", uid);
write_file("/proc/self/gid_map", "0 %d 1", gid);

- if (setgroups(0, NULL) != 0) {
- die("setgroups failed: %s\n",
- strerror(errno));
- }
if (setgid(0) != 0) {
die ("setgid(0) failed %s\n",
strerror(errno));
--
1.9.1

2014-12-02 20:58:32

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [CFT][PATCH 1/3] userns: Avoid problems with negative groups

On Tue, Dec 2, 2014 at 12:25 PM, Eric W. Biederman
<[email protected]> wrote:
>
> Classic unix permission checks have an interesting feature, the group
> permissions for a file can be set to less than the other permissions
> on a file. Occassionally this is used deliberately to give a certain
> group of users fewer permissions than the default.
>
> Overlooking negative groups has resulted in the permission checks for
> setting up a group mapping in a user namespace to be too lax. Tighten
> the permission checks in new_idmap_permitted to ensure that mapping
> uids and gids into user namespaces without privilege will not result
> in new combinations of credentials being available to the users.
>
> When setting mappings without privilege only the creator of the user
> namespace is interesting as all other users that have CAP_SETUID over
> the user namespace will also have CAP_SETUID over the user namespaces
> parent. So the scope of the unprivileged check is reduced to just
> the case where cred->euid is the namespace creator.
>
> For setting a uid mapping without privilege only euid is considered as
> setresuid can set uid, suid and fsuid from euid without privielege
> making any combination of uids possible with user namespaces already
> possible without them.
>
> For now seeting a gid mapping without privilege is removed. The only
> possible set of credentials that would be safe without a gid mapping
> (egid without any supplementary groups) just doesn't happen in practice
> so would simply lead to unused untested code.
>
> setgroups is modified to fail not only when the group ids do not
> map but also when there are no gid mappings at all, preventing
> setgroups(0, NULL) from succeeding when gid mappings have not been
> established.
>
> For a small class of applications this change breaks userspace
> and removes useful functionality. This small class of applications
> includes tools/testing/selftests/mount/unprivileged-remount-test.c
>
> Most of the removed functionality will be added back with the
> addition of a one way knob to disable setgroups. Once setgroups
> is disabled setting the gid_map becomes as safe as setting the uid_map.
>
> For more common applications that set the uid_map and the gid_map with
> privilege this change will have no effect on them.
>
> This should fix CVE-2014-8989.
>
> Cc: [email protected]
> Signed-off-by: "Eric W. Biederman" <[email protected]>
> ---

>
> +static inline bool gid_mapping_possible(const struct user_namespace *ns)
> +{
> + return ns->gid_map.nr_extents != 0;
> +}
> +

Can you rename this to userns_may_setgroups or something like that?
To me, gid_mapping_possible sounds like you're allowed to map gids,
which sounds like the opposite condition, and it doesn't explain what
the point is.


> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index aa312b0dc3ec..51d65b444951 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -812,16 +812,19 @@ static bool new_idmap_permitted(const struct file *file,
> struct user_namespace *ns, int cap_setid,
> struct uid_gid_map *new_map)
> {
> - /* Allow mapping to your own filesystem ids */
> - if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
> + const struct cred *cred = file->f_cred;
> +
> + /* Allow a mapping without capabilities when allowing the root
> + * of the user namespace capabilities restricted to that id
> + * will not change the set of credentials available to that
> + * user.
> + */
> + if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) &&
> + uid_eq(ns->owner, cred->euid)) {

What's uid_eq(ns->owner, cred->euid)) for? This should already be covered by:

if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN))
goto out;

(except that I don't know why cap_valid(cap_setid) is checked -- this
ought to be enforced for projid_map, too, right?)

> u32 id = new_map->extent[0].lower_first;
> if (cap_setid == CAP_SETUID) {
> kuid_t uid = make_kuid(ns->parent, id);
> - if (uid_eq(uid, file->f_cred->fsuid))
> - return true;
> - } else if (cap_setid == CAP_SETGID) {
> - kgid_t gid = make_kgid(ns->parent, id);
> - if (gid_eq(gid, file->f_cred->fsgid))
> + if (uid_eq(uid, cred->euid))

Why'd you change this from fsuid to euid?

--Andy

2014-12-02 21:05:32

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis

On Tue, Dec 2, 2014 at 12:28 PM, Eric W. Biederman
<[email protected]> wrote:
>
> - Expose the knob to user space through a proc file /proc/<pid>/setgroups

Can you rename this to something clearer, e.g. userns_setgroups_mode?

>
> A value of 0 means the setgroups system call is disabled in the
> current processes user namespace and can not be enabled in the
> future in this user namespace.
>
> A value of 1 means the segtoups system call is enabled.

Would it make more sense to put strings like "allow" and "deny" in
here? That way, future extensions could add additional values.

> diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
> index 21c91feeca2d..6d0ee1b089fb 100644
> --- a/arch/s390/kernel/compat_linux.c
> +++ b/arch/s390/kernel/compat_linux.c
> @@ -252,6 +252,7 @@ COMPAT_SYSCALL_DEFINE2(s390_setgroups16, int, gidsetsize, u16 __user *, grouplis
> int retval;
>
> if (!gid_mapping_possible(user_ns) ||
> + !atomic_read(&user_ns->setgroups_allowed) ||
> !capable(CAP_SETGID))
> return -EPERM;

This is now incomprehensible because of the gid_mapping_possible
thing. If you renamed gid_mapping_possible to
userns_setgroup_allowed, then this could be added to the
implementation, and this would all make sense (not to mention avoiding
duplicating this thing).

> @@ -826,6 +827,11 @@ static bool new_idmap_permitted(const struct file *file,
> kuid_t uid = make_kuid(ns->parent, id);
> if (uid_eq(uid, cred->euid))
> return true;
> + } else if (cap_setid == CAP_SETGID) {
> + kgid_t gid = make_kgid(ns->parent, id);
> + if (!atomic_read(&ns->setgroups_allowed) &&
> + gid_eq(gid, cred->egid))
> + return true;

I still don't see why egid is any better than fsgid here.

> }
> }
>
> @@ -844,6 +850,93 @@ static bool new_idmap_permitted(const struct file *file,
> return false;
> }
>
> +static void *setgroups_m_start(struct seq_file *seq, loff_t *ppos)
> +{
> + struct user_namespace *ns = seq->private;
> +
> + return (*ppos == 0) ? ns : NULL;
> +}
> +
> +static void *setgroups_m_next(struct seq_file *seq, void *v, loff_t *ppos)
> +{
> + ++*ppos;
> + return NULL;
> +}
> +
> +static void setgroups_m_stop(struct seq_file *seq, void *v)
> +{
> +}
> +
> +static int setgroups_m_show(struct seq_file *seq, void *v)
> +{
> + struct user_namespace *ns = seq->private;
> +
> + seq_printf(seq, "%u\n", atomic_read(&ns->setgroups_allowed));
> + return 0;
> +}
> +
> +const struct seq_operations proc_setgroups_seq_operations = {
> + .start = setgroups_m_start,
> + .stop = setgroups_m_stop,
> + .next = setgroups_m_next,
> + .show = setgroups_m_show,
> +};
> +
> +ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
> + size_t count, loff_t *ppos)
> +{
> + struct seq_file *seq = file->private_data;
> + struct user_namespace *ns = seq->private;
> + char kbuf[3];
> + int setgroups_allowed;
> + ssize_t ret;
> +
> + ret = -EPERM;
> + if (!file_ns_capable(file, ns, CAP_SETGID))
> + goto out;

CAP_SYS_ADMIN? This isn't setting a gid in the namespace; it's
reconfiguring the namespace.

> +
> + /* Only allow a very narrow range of strings to be written */
> + ret = -EINVAL;
> + if ((*ppos != 0) || (count >= sizeof(kbuf)) || (count < 1))
> + goto out;
> +
> + /* What was written? */
> + ret = -EFAULT;
> + if (copy_from_user(kbuf, buf, count))
> + goto out;
> + kbuf[count] = '\0';
> +
> + /* What is being requested? */
> + ret = -EINVAL;
> + if (kbuf[0] == '0')
> + setgroups_allowed = 0;
> + else if (kbuf[0] == '1')
> + setgroups_allowed = 1;
> + else
> + goto out;
> +
> + /* Allow a trailing newline */
> + ret = -EINVAL;
> + if ((count == 2) && (kbuf[1] != '\n'))
> + goto out;
> +
> +
> + if (setgroups_allowed) {
> + ret = -EINVAL;
> + if (atomic_read(&ns->setgroups_allowed) == 0)
> + goto out;
> + } else {

I would disallow this if gid_map has been written in the interest of sanity.

> + atomic_set(&ns->setgroups_allowed, 0);
> + /* sigh memory barriers! */

I don't think that any barriers are needed. If you ever observe
setgroups_allowed == 0, it will stay that way forever.

> + }
> +
> + /* Report a successful write */
> + *ppos = count;
> + ret = count;
> +out:
> + return ret;
> +}
> +
> static void *userns_get(struct task_struct *task)
> {
> struct user_namespace *user_ns;

--Andy

2014-12-02 21:28:35

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [CFT][PATCH 1/3] userns: Avoid problems with negative groups

Andy Lutomirski <[email protected]> writes:

> On Tue, Dec 2, 2014 at 12:25 PM, Eric W. Biederman
> <[email protected]> wrote:
>>
>> Classic unix permission checks have an interesting feature, the group
>> permissions for a file can be set to less than the other permissions
>> on a file. Occassionally this is used deliberately to give a certain
>> group of users fewer permissions than the default.
>>
>> Overlooking negative groups has resulted in the permission checks for
>> setting up a group mapping in a user namespace to be too lax. Tighten
>> the permission checks in new_idmap_permitted to ensure that mapping
>> uids and gids into user namespaces without privilege will not result
>> in new combinations of credentials being available to the users.
>>
>> When setting mappings without privilege only the creator of the user
>> namespace is interesting as all other users that have CAP_SETUID over
>> the user namespace will also have CAP_SETUID over the user namespaces
>> parent. So the scope of the unprivileged check is reduced to just
>> the case where cred->euid is the namespace creator.
>>
>> For setting a uid mapping without privilege only euid is considered as
>> setresuid can set uid, suid and fsuid from euid without privielege
>> making any combination of uids possible with user namespaces already
>> possible without them.
>>
>> For now seeting a gid mapping without privilege is removed. The only
>> possible set of credentials that would be safe without a gid mapping
>> (egid without any supplementary groups) just doesn't happen in practice
>> so would simply lead to unused untested code.
>>
>> setgroups is modified to fail not only when the group ids do not
>> map but also when there are no gid mappings at all, preventing
>> setgroups(0, NULL) from succeeding when gid mappings have not been
>> established.
>>
>> For a small class of applications this change breaks userspace
>> and removes useful functionality. This small class of applications
>> includes tools/testing/selftests/mount/unprivileged-remount-test.c
>>
>> Most of the removed functionality will be added back with the
>> addition of a one way knob to disable setgroups. Once setgroups
>> is disabled setting the gid_map becomes as safe as setting the uid_map.
>>
>> For more common applications that set the uid_map and the gid_map with
>> privilege this change will have no effect on them.
>>
>> This should fix CVE-2014-8989.
>>
>> Cc: [email protected]
>> Signed-off-by: "Eric W. Biederman" <[email protected]>
>> ---
>
>>
>> +static inline bool gid_mapping_possible(const struct user_namespace *ns)
>> +{
>> + return ns->gid_map.nr_extents != 0;
>> +}
>> +
>
> Can you rename this to userns_may_setgroups or something like that?
> To me, gid_mapping_possible sounds like you're allowed to map gids,
> which sounds like the opposite condition, and it doesn't explain what
> the point is.

gid_mapping_established?

What I mean to be testing if is if from_kgid and make_kgid will work
because the gid mappings have been set.

The userns knob for setgroups is a different test and is added
in the next patch. And yes we really need both or the knob can
start out as on, and we need to provent setgroups(0, NULL)
before the user namespace is unshared.

Although come to think about it probably makes sense to roll those two
test into one function and call that inline function from the setgroups
implementation.

Anyway I will think about it and see what I can do to make it easily
comprehensible.

>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> index aa312b0dc3ec..51d65b444951 100644
>> --- a/kernel/user_namespace.c
>> +++ b/kernel/user_namespace.c
>> @@ -812,16 +812,19 @@ static bool new_idmap_permitted(const struct file *file,
>> struct user_namespace *ns, int cap_setid,
>> struct uid_gid_map *new_map)
>> {
>> - /* Allow mapping to your own filesystem ids */
>> - if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
>> + const struct cred *cred = file->f_cred;
>> +
>> + /* Allow a mapping without capabilities when allowing the root
>> + * of the user namespace capabilities restricted to that id
>> + * will not change the set of credentials available to that
>> + * user.
>> + */
>> + if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) &&
>> + uid_eq(ns->owner, cred->euid)) {
>
> What's uid_eq(ns->owner, cred->euid)) for? This should already be covered by:

This means that the only user we attempt to set up unprivileged mappings
for is the owner of the user namespace. Anyone else should already
have capabilities in the parent user namespace or shouldn't be able to
set the mapping at all.

In practice it is a clarification to make it easier to think about the code.

> if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN))
> goto out;
>
> (except that I don't know why cap_valid(cap_setid) is checked -- this
> ought to be enforced for projid_map, too, right?)

What to do with projid_map is entirely different discussion. In
practice it is dead, and either XFS needs to be fixed to use it
or that code needs to be removed. At the time I wrote it XFS
did not require any privileges to set project ids.

>> u32 id = new_map->extent[0].lower_first;
>> if (cap_setid == CAP_SETUID) {
>> kuid_t uid = make_kuid(ns->parent, id);
>> - if (uid_eq(uid, file->f_cred->fsuid))
>> - return true;
>> - } else if (cap_setid == CAP_SETGID) {
>> - kgid_t gid = make_kgid(ns->parent, id);
>> - if (gid_eq(gid, file->f_cred->fsgid))
>> + if (uid_eq(uid, cred->euid))
>
> Why'd you change this from fsuid to euid?

Because strangely enough I can set euid to any other uid with
setresuid, but the same does not hold with fsuid.

So strictly speaking fsuid was actually wrong before. In practice
fsuid == euid so I don't think anyone will care. But I want very much
to enforce the rule that user namespaces can't give you any credentials
you couldn't get otherwise.

Eric

2014-12-02 21:47:54

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis

Andy Lutomirski <[email protected]> writes:

> On Tue, Dec 2, 2014 at 12:28 PM, Eric W. Biederman
> <[email protected]> wrote:
>>
>> - Expose the knob to user space through a proc file /proc/<pid>/setgroups
>
> Can you rename this to something clearer, e.g. userns_setgroups_mode?

I am not certain that is any clearer. That just reads as wordier.

The userns bit is definitely confusing and wrong. Why should we need to
spell out the scope?

>> A value of 0 means the setgroups system call is disabled in the
>> current processes user namespace and can not be enabled in the
>> future in this user namespace.
>>
>> A value of 1 means the segtoups system call is enabled.
>
> Would it make more sense to put strings like "allow" and "deny" in
> here? That way, future extensions could add additional values.

If the implementation of the write side isn't too bad. I would love
to see precedent elsewhere in the kernel. What I don't expect to do
is have any values except setgroups are enabled and setgroups are
disabled.

I am fine with allowing for the possibility (that is just good
engineering) but I don't intend to seriously considering or
implementing other possibilities.

>> diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
>> index 21c91feeca2d..6d0ee1b089fb 100644
>> --- a/arch/s390/kernel/compat_linux.c
>> +++ b/arch/s390/kernel/compat_linux.c
>> @@ -252,6 +252,7 @@ COMPAT_SYSCALL_DEFINE2(s390_setgroups16, int, gidsetsize, u16 __user *, grouplis
>> int retval;
>>
>> if (!gid_mapping_possible(user_ns) ||
>> + !atomic_read(&user_ns->setgroups_allowed) ||
>> !capable(CAP_SETGID))
>> return -EPERM;
>
> This is now incomprehensible because of the gid_mapping_possible
> thing. If you renamed gid_mapping_possible to
> userns_setgroup_allowed, then this could be added to the
> implementation, and this would all make sense (not to mention avoiding
> duplicating this thing).
>
>> @@ -826,6 +827,11 @@ static bool new_idmap_permitted(const struct file *file,
>> kuid_t uid = make_kuid(ns->parent, id);
>> if (uid_eq(uid, cred->euid))
>> return true;
>> + } else if (cap_setid == CAP_SETGID) {
>> + kgid_t gid = make_kgid(ns->parent, id);
>> + if (!atomic_read(&ns->setgroups_allowed) &&
>> + gid_eq(gid, cred->egid))
>> + return true;
>
> I still don't see why egid is any better than fsgid here.

Answered in my earlier response fsgid was a goof.
I can set any gid to my egid with my existing permissions.
Show me how I can do that with fsgid or fsuid and I will be happy to use
those.


>> }
>> }
>>
>> @@ -844,6 +850,93 @@ static bool new_idmap_permitted(const struct file *file,
>> return false;
>> }
>>
>> +static void *setgroups_m_start(struct seq_file *seq, loff_t *ppos)
>> +{
>> + struct user_namespace *ns = seq->private;
>> +
>> + return (*ppos == 0) ? ns : NULL;
>> +}
>> +
>> +static void *setgroups_m_next(struct seq_file *seq, void *v, loff_t *ppos)
>> +{
>> + ++*ppos;
>> + return NULL;
>> +}
>> +
>> +static void setgroups_m_stop(struct seq_file *seq, void *v)
>> +{
>> +}
>> +
>> +static int setgroups_m_show(struct seq_file *seq, void *v)
>> +{
>> + struct user_namespace *ns = seq->private;
>> +
>> + seq_printf(seq, "%u\n", atomic_read(&ns->setgroups_allowed));
>> + return 0;
>> +}
>> +
>> +const struct seq_operations proc_setgroups_seq_operations = {
>> + .start = setgroups_m_start,
>> + .stop = setgroups_m_stop,
>> + .next = setgroups_m_next,
>> + .show = setgroups_m_show,
>> +};
>> +
>> +ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
>> + size_t count, loff_t *ppos)
>> +{
>> + struct seq_file *seq = file->private_data;
>> + struct user_namespace *ns = seq->private;
>> + char kbuf[3];
>> + int setgroups_allowed;
>> + ssize_t ret;
>> +
>> + ret = -EPERM;
>> + if (!file_ns_capable(file, ns, CAP_SETGID))
>> + goto out;
>
> CAP_SYS_ADMIN? This isn't setting a gid in the namespace; it's
> reconfiguring the namespace.

Hmm. Maybe. It is an activity that is normally controlled by
CAP_SETGID.

Frankly I think the entire split up of the capability model is almost
totally broken. But I think CAP_SETGID is a close approximation of the
right thing here.

>> + /* Only allow a very narrow range of strings to be written */
>> + ret = -EINVAL;
>> + if ((*ppos != 0) || (count >= sizeof(kbuf)) || (count < 1))
>> + goto out;
>> +
>> + /* What was written? */
>> + ret = -EFAULT;
>> + if (copy_from_user(kbuf, buf, count))
>> + goto out;
>> + kbuf[count] = '\0';
>> +
>> + /* What is being requested? */
>> + ret = -EINVAL;
>> + if (kbuf[0] == '0')
>> + setgroups_allowed = 0;
>> + else if (kbuf[0] == '1')
>> + setgroups_allowed = 1;
>> + else
>> + goto out;
>> +
>> + /* Allow a trailing newline */
>> + ret = -EINVAL;
>> + if ((count == 2) && (kbuf[1] != '\n'))
>> + goto out;
>> +
>> +
>> + if (setgroups_allowed) {
>> + ret = -EINVAL;
>> + if (atomic_read(&ns->setgroups_allowed) == 0)
>> + goto out;
>> + } else {
>
> I would disallow this if gid_map has been written in the interest of
> sanity.

Not a chance. That is part of making this an independent knob. If
there is another reason for disabling setgroups you can flip this
knob even after mappings are established.

>> + atomic_set(&ns->setgroups_allowed, 0);
>> + /* sigh memory barriers! */
>
> I don't think that any barriers are needed. If you ever observe
> setgroups_allowed == 0, it will stay that way forever.

Likely. In practice the code works today.

But I need to review things closely to understand if there are barriers
needed. But especially since it is a write once knob we can get away
with a lot.

>> + }
>> +
>> + /* Report a successful write */
>> + *ppos = count;
>> + ret = count;
>> +out:
>> + return ret;
>> +}
>> +
>> static void *userns_get(struct task_struct *task)
>> {
>> struct user_namespace *user_ns;
>
> --Andy

2014-12-02 22:09:40

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [CFT][PATCH 1/3] userns: Avoid problems with negative groups

On Tue, Dec 2, 2014 at 1:26 PM, Eric W. Biederman <[email protected]> wrote:
> Andy Lutomirski <[email protected]> writes:
>
>> On Tue, Dec 2, 2014 at 12:25 PM, Eric W. Biederman
>> <[email protected]> wrote:
>>>
>>> Classic unix permission checks have an interesting feature, the group
>>> permissions for a file can be set to less than the other permissions
>>> on a file. Occassionally this is used deliberately to give a certain
>>> group of users fewer permissions than the default.
>>>
>>> Overlooking negative groups has resulted in the permission checks for
>>> setting up a group mapping in a user namespace to be too lax. Tighten
>>> the permission checks in new_idmap_permitted to ensure that mapping
>>> uids and gids into user namespaces without privilege will not result
>>> in new combinations of credentials being available to the users.
>>>
>>> When setting mappings without privilege only the creator of the user
>>> namespace is interesting as all other users that have CAP_SETUID over
>>> the user namespace will also have CAP_SETUID over the user namespaces
>>> parent. So the scope of the unprivileged check is reduced to just
>>> the case where cred->euid is the namespace creator.
>>>
>>> For setting a uid mapping without privilege only euid is considered as
>>> setresuid can set uid, suid and fsuid from euid without privielege
>>> making any combination of uids possible with user namespaces already
>>> possible without them.
>>>
>>> For now seeting a gid mapping without privilege is removed. The only
>>> possible set of credentials that would be safe without a gid mapping
>>> (egid without any supplementary groups) just doesn't happen in practice
>>> so would simply lead to unused untested code.
>>>
>>> setgroups is modified to fail not only when the group ids do not
>>> map but also when there are no gid mappings at all, preventing
>>> setgroups(0, NULL) from succeeding when gid mappings have not been
>>> established.
>>>
>>> For a small class of applications this change breaks userspace
>>> and removes useful functionality. This small class of applications
>>> includes tools/testing/selftests/mount/unprivileged-remount-test.c
>>>
>>> Most of the removed functionality will be added back with the
>>> addition of a one way knob to disable setgroups. Once setgroups
>>> is disabled setting the gid_map becomes as safe as setting the uid_map.
>>>
>>> For more common applications that set the uid_map and the gid_map with
>>> privilege this change will have no effect on them.
>>>
>>> This should fix CVE-2014-8989.
>>>
>>> Cc: [email protected]
>>> Signed-off-by: "Eric W. Biederman" <[email protected]>
>>> ---
>>
>>>
>>> +static inline bool gid_mapping_possible(const struct user_namespace *ns)
>>> +{
>>> + return ns->gid_map.nr_extents != 0;
>>> +}
>>> +
>>
>> Can you rename this to userns_may_setgroups or something like that?
>> To me, gid_mapping_possible sounds like you're allowed to map gids,
>> which sounds like the opposite condition, and it doesn't explain what
>> the point is.
>
> gid_mapping_established?
>
> What I mean to be testing if is if from_kgid and make_kgid will work
> because the gid mappings have been set.

But why do you care whether from_kgid and make_kgid will work? If
they fail, then they fail. I think that the point is that you're
checking whether allowing setgroups to drop groups is safe, and that's
only barely the same condition.

>
> The userns knob for setgroups is a different test and is added
> in the next patch. And yes we really need both or the knob can
> start out as on, and we need to provent setgroups(0, NULL)
> before the user namespace is unshared.

Do you mean before it's mapped?

>
> Although come to think about it probably makes sense to roll those two
> test into one function and call that inline function from the setgroups
> implementation.

That's what I think, too.

>
> Anyway I will think about it and see what I can do to make it easily
> comprehensible.
>
>>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>>> index aa312b0dc3ec..51d65b444951 100644
>>> --- a/kernel/user_namespace.c
>>> +++ b/kernel/user_namespace.c
>>> @@ -812,16 +812,19 @@ static bool new_idmap_permitted(const struct file *file,
>>> struct user_namespace *ns, int cap_setid,
>>> struct uid_gid_map *new_map)
>>> {
>>> - /* Allow mapping to your own filesystem ids */
>>> - if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
>>> + const struct cred *cred = file->f_cred;
>>> +
>>> + /* Allow a mapping without capabilities when allowing the root
>>> + * of the user namespace capabilities restricted to that id
>>> + * will not change the set of credentials available to that
>>> + * user.
>>> + */
>>> + if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) &&
>>> + uid_eq(ns->owner, cred->euid)) {
>>
>> What's uid_eq(ns->owner, cred->euid)) for? This should already be covered by:
>
> This means that the only user we attempt to set up unprivileged mappings
> for is the owner of the user namespace. Anyone else should already
> have capabilities in the parent user namespace or shouldn't be able to
> set the mapping at all.
>
> In practice it is a clarification to make it easier to think about the code.

But why? I don't see why this check is necessary or why it's relevant
to the current issue.

>
>> if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN))
>> goto out;
>>
>> (except that I don't know why cap_valid(cap_setid) is checked -- this
>> ought to be enforced for projid_map, too, right?)
>
> What to do with projid_map is entirely different discussion. In
> practice it is dead, and either XFS needs to be fixed to use it
> or that code needs to be removed. At the time I wrote it XFS
> did not require any privileges to set project ids.
>
>>> u32 id = new_map->extent[0].lower_first;
>>> if (cap_setid == CAP_SETUID) {
>>> kuid_t uid = make_kuid(ns->parent, id);
>>> - if (uid_eq(uid, file->f_cred->fsuid))
>>> - return true;
>>> - } else if (cap_setid == CAP_SETGID) {
>>> - kgid_t gid = make_kgid(ns->parent, id);
>>> - if (gid_eq(gid, file->f_cred->fsgid))
>>> + if (uid_eq(uid, cred->euid))
>>
>> Why'd you change this from fsuid to euid?
>
> Because strangely enough I can set euid to any other uid with
> setresuid, but the same does not hold with fsuid.
>
> So strictly speaking fsuid was actually wrong before. In practice
> fsuid == euid so I don't think anyone will care. But I want very much
> to enforce the rule that user namespaces can't give you any credentials
> you couldn't get otherwise.

Fair enough. Want to split that into a separate patch, then?

--Andy

>
> Eric



--
Andy Lutomirski
AMA Capital Management, LLC

2014-12-02 22:17:42

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis

On Tue, Dec 2, 2014 at 1:45 PM, Eric W. Biederman <[email protected]> wrote:
> Andy Lutomirski <[email protected]> writes:
>
>> On Tue, Dec 2, 2014 at 12:28 PM, Eric W. Biederman
>> <[email protected]> wrote:
>>>
>>> - Expose the knob to user space through a proc file /proc/<pid>/setgroups
>>
>> Can you rename this to something clearer, e.g. userns_setgroups_mode?
>
> I am not certain that is any clearer. That just reads as wordier.
>
> The userns bit is definitely confusing and wrong. Why should we need to
> spell out the scope?

Because it's a property of the process' userns, not a property of the process.

userns_setgroups would be okay. (Arguably it should've been
userns_uid_map, too, but at least uid_map sounds obviously
namespace-related.)

>
>>> A value of 0 means the setgroups system call is disabled in the
>>> current processes user namespace and can not be enabled in the
>>> future in this user namespace.
>>>
>>> A value of 1 means the segtoups system call is enabled.
>>
>> Would it make more sense to put strings like "allow" and "deny" in
>> here? That way, future extensions could add additional values.
>
> If the implementation of the write side isn't too bad. I would love
> to see precedent elsewhere in the kernel. What I don't expect to do
> is have any values except setgroups are enabled and setgroups are
> disabled.

current_clocksource? I think there are lots of things like that.

>
> I am fine with allowing for the possibility (that is just good
> engineering) but I don't intend to seriously considering or
> implementing other possibilities.

I can imagine a mode where there's a fixed set of groups that are
forced set but other groups can be added and dropped. It would be
complicated to set up right, but someone might want it some day.

>
>>> diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
>>> index 21c91feeca2d..6d0ee1b089fb 100644
>>> --- a/arch/s390/kernel/compat_linux.c
>>> +++ b/arch/s390/kernel/compat_linux.c
>>> @@ -252,6 +252,7 @@ COMPAT_SYSCALL_DEFINE2(s390_setgroups16, int, gidsetsize, u16 __user *, grouplis
>>> int retval;
>>>
>>> if (!gid_mapping_possible(user_ns) ||
>>> + !atomic_read(&user_ns->setgroups_allowed) ||
>>> !capable(CAP_SETGID))
>>> return -EPERM;
>>
>> This is now incomprehensible because of the gid_mapping_possible
>> thing. If you renamed gid_mapping_possible to
>> userns_setgroup_allowed, then this could be added to the
>> implementation, and this would all make sense (not to mention avoiding
>> duplicating this thing).
>>
>>> @@ -826,6 +827,11 @@ static bool new_idmap_permitted(const struct file *file,
>>> kuid_t uid = make_kuid(ns->parent, id);
>>> if (uid_eq(uid, cred->euid))
>>> return true;
>>> + } else if (cap_setid == CAP_SETGID) {
>>> + kgid_t gid = make_kgid(ns->parent, id);
>>> + if (!atomic_read(&ns->setgroups_allowed) &&
>>> + gid_eq(gid, cred->egid))
>>> + return true;
>>
>> I still don't see why egid is any better than fsgid here.
>
> Answered in my earlier response fsgid was a goof.
> I can set any gid to my egid with my existing permissions.
> Show me how I can do that with fsgid or fsuid and I will be happy to use
> those.

You can use your fsgid to make a setgid file, I think. But yes, point
taken, although as mentioned in the other thread, I think it would be
a lot clearer if it were a separate patch.


>>> +
>>> +ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
>>> + size_t count, loff_t *ppos)
>>> +{
>>> + struct seq_file *seq = file->private_data;
>>> + struct user_namespace *ns = seq->private;
>>> + char kbuf[3];
>>> + int setgroups_allowed;
>>> + ssize_t ret;
>>> +
>>> + ret = -EPERM;
>>> + if (!file_ns_capable(file, ns, CAP_SETGID))
>>> + goto out;
>>
>> CAP_SYS_ADMIN? This isn't setting a gid in the namespace; it's
>> reconfiguring the namespace.
>
> Hmm. Maybe. It is an activity that is normally controlled by
> CAP_SETGID.
>
> Frankly I think the entire split up of the capability model is almost
> totally broken. But I think CAP_SETGID is a close approximation of the
> right thing here.

I agree that the cap model is screwy. But we use CAP_SYS_ADMIN for
everything else that changes the overall behavior of a namespace.

In any event, the only way it matters is for a non-ns owner in the
parent ns with CAP_SETGID set but not CAP_SYS_ADMIN. I'd argue that
CAP_SETGID should not be usable to make unrelated process' syscalls
fail.

>
>>> + /* Only allow a very narrow range of strings to be written */
>>> + ret = -EINVAL;
>>> + if ((*ppos != 0) || (count >= sizeof(kbuf)) || (count < 1))
>>> + goto out;
>>> +
>>> + /* What was written? */
>>> + ret = -EFAULT;
>>> + if (copy_from_user(kbuf, buf, count))
>>> + goto out;
>>> + kbuf[count] = '\0';
>>> +
>>> + /* What is being requested? */
>>> + ret = -EINVAL;
>>> + if (kbuf[0] == '0')
>>> + setgroups_allowed = 0;
>>> + else if (kbuf[0] == '1')
>>> + setgroups_allowed = 1;
>>> + else
>>> + goto out;
>>> +
>>> + /* Allow a trailing newline */
>>> + ret = -EINVAL;
>>> + if ((count == 2) && (kbuf[1] != '\n'))
>>> + goto out;
>>> +
>>> +
>>> + if (setgroups_allowed) {
>>> + ret = -EINVAL;
>>> + if (atomic_read(&ns->setgroups_allowed) == 0)
>>> + goto out;
>>> + } else {
>>
>> I would disallow this if gid_map has been written in the interest of
>> sanity.
>
> Not a chance. That is part of making this an independent knob. If
> there is another reason for disabling setgroups you can flip this
> knob even after mappings are established.

Then you really want CAP_SYS_ADMIN, I think.

>
>>> + atomic_set(&ns->setgroups_allowed, 0);
>>> + /* sigh memory barriers! */
>>
>> I don't think that any barriers are needed. If you ever observe
>> setgroups_allowed == 0, it will stay that way forever.
>
> Likely. In practice the code works today.
>
> But I need to review things closely to understand if there are barriers
> needed. But especially since it is a write once knob we can get away
> with a lot.
>

Yeah.

For long-term use, I kind of like the flags approach I added in the
other patch. It makes it easy to add more flags in the future.

In any event, I think the only barrier that's needed is when writing gid_map:

atomic_read / test_bit to determine whether unpriv mappings are okay;
smp_mb() or whatever the current _after_atomic thing is these days;
write mapping;

Although I'm not sure whether Linux supports any architectures that
can violate causality in the way that barrier is there to prevent.

--Andy

2014-12-02 22:51:09

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [CFT][PATCH 1/3] userns: Avoid problems with negative groups

Andy Lutomirski <[email protected]> writes:

> On Tue, Dec 2, 2014 at 1:26 PM, Eric W. Biederman <[email protected]> wrote:
>> Andy Lutomirski <[email protected]> writes:
>>
>>> On Tue, Dec 2, 2014 at 12:25 PM, Eric W. Biederman
>>> <[email protected]> wrote:
>>>>
>>>> Classic unix permission checks have an interesting feature, the group
>>>> permissions for a file can be set to less than the other permissions
>>>> on a file. Occassionally this is used deliberately to give a certain
>>>> group of users fewer permissions than the default.
>>>>
>>>> Overlooking negative groups has resulted in the permission checks for
>>>> setting up a group mapping in a user namespace to be too lax. Tighten
>>>> the permission checks in new_idmap_permitted to ensure that mapping
>>>> uids and gids into user namespaces without privilege will not result
>>>> in new combinations of credentials being available to the users.
>>>>
>>>> When setting mappings without privilege only the creator of the user
>>>> namespace is interesting as all other users that have CAP_SETUID over
>>>> the user namespace will also have CAP_SETUID over the user namespaces
>>>> parent. So the scope of the unprivileged check is reduced to just
>>>> the case where cred->euid is the namespace creator.
>>>>
>>>> For setting a uid mapping without privilege only euid is considered as
>>>> setresuid can set uid, suid and fsuid from euid without privielege
>>>> making any combination of uids possible with user namespaces already
>>>> possible without them.
>>>>
>>>> For now seeting a gid mapping without privilege is removed. The only
>>>> possible set of credentials that would be safe without a gid mapping
>>>> (egid without any supplementary groups) just doesn't happen in practice
>>>> so would simply lead to unused untested code.
>>>>
>>>> setgroups is modified to fail not only when the group ids do not
>>>> map but also when there are no gid mappings at all, preventing
>>>> setgroups(0, NULL) from succeeding when gid mappings have not been
>>>> established.
>>>>
>>>> For a small class of applications this change breaks userspace
>>>> and removes useful functionality. This small class of applications
>>>> includes tools/testing/selftests/mount/unprivileged-remount-test.c
>>>>
>>>> Most of the removed functionality will be added back with the
>>>> addition of a one way knob to disable setgroups. Once setgroups
>>>> is disabled setting the gid_map becomes as safe as setting the uid_map.
>>>>
>>>> For more common applications that set the uid_map and the gid_map with
>>>> privilege this change will have no effect on them.
>>>>
>>>> This should fix CVE-2014-8989.
>>>>
>>>> Cc: [email protected]
>>>> Signed-off-by: "Eric W. Biederman" <[email protected]>
>>>> ---
>>>
>>>>
>>>> +static inline bool gid_mapping_possible(const struct user_namespace *ns)
>>>> +{
>>>> + return ns->gid_map.nr_extents != 0;
>>>> +}
>>>> +
>>>
>>> Can you rename this to userns_may_setgroups or something like that?
>>> To me, gid_mapping_possible sounds like you're allowed to map gids,
>>> which sounds like the opposite condition, and it doesn't explain what
>>> the point is.
>>
>> gid_mapping_established?
>>
>> What I mean to be testing if is if from_kgid and make_kgid will work
>> because the gid mappings have been set.
>
> But why do you care whether from_kgid and make_kgid will work? If
> they fail, then they fail. I think that the point is that you're
> checking whether allowing setgroups to drop groups is safe, and that's
> only barely the same condition.

For all of the system calls to set or change uids and gids except
setgroups it happens to fall out that if there are no mappings set the
system calls fail. That is and was deliberate. However setgroups is
weird because it allows the case of 0 mappings and to maintain the
constraint that it fails when there are no mapping set (just like
everything else) that requires an additional test.

>> The userns knob for setgroups is a different test and is added
>> in the next patch. And yes we really need both or the knob can
>> start out as on, and we need to provent setgroups(0, NULL)
>> before the user namespace is unshared.
>
> Do you mean before it's mapped?

Right we need to prevent setgroups(0, NULL) before we set the gid
mapping.

>> Although come to think about it probably makes sense to roll those two
>> test into one function and call that inline function from the setgroups
>> implementation.
>
> That's what I think, too.
>
>>
>> Anyway I will think about it and see what I can do to make it easily
>> comprehensible.
>>
>>>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>>>> index aa312b0dc3ec..51d65b444951 100644
>>>> --- a/kernel/user_namespace.c
>>>> +++ b/kernel/user_namespace.c
>>>> @@ -812,16 +812,19 @@ static bool new_idmap_permitted(const struct file *file,
>>>> struct user_namespace *ns, int cap_setid,
>>>> struct uid_gid_map *new_map)
>>>> {
>>>> - /* Allow mapping to your own filesystem ids */
>>>> - if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
>>>> + const struct cred *cred = file->f_cred;
>>>> +
>>>> + /* Allow a mapping without capabilities when allowing the root
>>>> + * of the user namespace capabilities restricted to that id
>>>> + * will not change the set of credentials available to that
>>>> + * user.
>>>> + */
>>>> + if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) &&
>>>> + uid_eq(ns->owner, cred->euid)) {
>>>
>>> What's uid_eq(ns->owner, cred->euid)) for? This should already be covered by:
>>
>> This means that the only user we attempt to set up unprivileged mappings
>> for is the owner of the user namespace. Anyone else should already
>> have capabilities in the parent user namespace or shouldn't be able to
>> set the mapping at all.
>>
>> In practice it is a clarification to make it easier to think about the code.
>
> But why? I don't see why this check is necessary or why it's relevant
> to the current issue.

My goal in this check is to guarantee that any combination of uids and
gids in struct cred that you can obtain with mappings and a user
namespace you can also obtain without privilege without a user
namespace.

What limiting euid to ns->owner does is it guarantees that when a user
namespace is created without privilege root doesn't come along and set
the mapping using the unprivileged path. That is confusing to think
about and it is not necessary to support.

With ns->owner == euid I have the guarantee especially with the gids
that they wind up paired with a uid in struct cred that came from the
same user. Either that or someone set one of the mappings with
privilege.

With ns->owner == euid I can verify all of these things pretty
trivially. Without that check I don't have a clue how to verify
the pairing between uids and gids in the unprivileged mapping.

Does that make things clearer?

>>> if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN))
>>> goto out;
>>>
>>> (except that I don't know why cap_valid(cap_setid) is checked -- this
>>> ought to be enforced for projid_map, too, right?)
>>
>> What to do with projid_map is entirely different discussion. In
>> practice it is dead, and either XFS needs to be fixed to use it
>> or that code needs to be removed. At the time I wrote it XFS
>> did not require any privileges to set project ids.
>>
>>>> u32 id = new_map->extent[0].lower_first;
>>>> if (cap_setid == CAP_SETUID) {
>>>> kuid_t uid = make_kuid(ns->parent, id);
>>>> - if (uid_eq(uid, file->f_cred->fsuid))
>>>> - return true;
>>>> - } else if (cap_setid == CAP_SETGID) {
>>>> - kgid_t gid = make_kgid(ns->parent, id);
>>>> - if (gid_eq(gid, file->f_cred->fsgid))
>>>> + if (uid_eq(uid, cred->euid))
>>>
>>> Why'd you change this from fsuid to euid?
>>
>> Because strangely enough I can set euid to any other uid with
>> setresuid, but the same does not hold with fsuid.
>>
>> So strictly speaking fsuid was actually wrong before. In practice
>> fsuid == euid so I don't think anyone will care. But I want very much
>> to enforce the rule that user namespaces can't give you any credentials
>> you couldn't get otherwise.
>
> Fair enough. Want to split that into a separate patch, then?

Strictly speaking it is part and parcel of the same thing but it
probably makes sense to split it out and emphasise and explain the
change.

Eric

2014-12-02 22:57:19

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [CFT][PATCH 1/3] userns: Avoid problems with negative groups

On Tue, Dec 2, 2014 at 2:48 PM, Eric W. Biederman <[email protected]> wrote:
> Andy Lutomirski <[email protected]> writes:
>
>> On Tue, Dec 2, 2014 at 1:26 PM, Eric W. Biederman <[email protected]> wrote:
>>> Andy Lutomirski <[email protected]> writes:
>>>
>>>> On Tue, Dec 2, 2014 at 12:25 PM, Eric W. Biederman
>>>> <[email protected]> wrote:
>>>>>
>>>>> Classic unix permission checks have an interesting feature, the group
>>>>> permissions for a file can be set to less than the other permissions
>>>>> on a file. Occassionally this is used deliberately to give a certain
>>>>> group of users fewer permissions than the default.
>>>>>
>>>>> Overlooking negative groups has resulted in the permission checks for
>>>>> setting up a group mapping in a user namespace to be too lax. Tighten
>>>>> the permission checks in new_idmap_permitted to ensure that mapping
>>>>> uids and gids into user namespaces without privilege will not result
>>>>> in new combinations of credentials being available to the users.
>>>>>
>>>>> When setting mappings without privilege only the creator of the user
>>>>> namespace is interesting as all other users that have CAP_SETUID over
>>>>> the user namespace will also have CAP_SETUID over the user namespaces
>>>>> parent. So the scope of the unprivileged check is reduced to just
>>>>> the case where cred->euid is the namespace creator.
>>>>>
>>>>> For setting a uid mapping without privilege only euid is considered as
>>>>> setresuid can set uid, suid and fsuid from euid without privielege
>>>>> making any combination of uids possible with user namespaces already
>>>>> possible without them.
>>>>>
>>>>> For now seeting a gid mapping without privilege is removed. The only
>>>>> possible set of credentials that would be safe without a gid mapping
>>>>> (egid without any supplementary groups) just doesn't happen in practice
>>>>> so would simply lead to unused untested code.
>>>>>
>>>>> setgroups is modified to fail not only when the group ids do not
>>>>> map but also when there are no gid mappings at all, preventing
>>>>> setgroups(0, NULL) from succeeding when gid mappings have not been
>>>>> established.
>>>>>
>>>>> For a small class of applications this change breaks userspace
>>>>> and removes useful functionality. This small class of applications
>>>>> includes tools/testing/selftests/mount/unprivileged-remount-test.c
>>>>>
>>>>> Most of the removed functionality will be added back with the
>>>>> addition of a one way knob to disable setgroups. Once setgroups
>>>>> is disabled setting the gid_map becomes as safe as setting the uid_map.
>>>>>
>>>>> For more common applications that set the uid_map and the gid_map with
>>>>> privilege this change will have no effect on them.
>>>>>
>>>>> This should fix CVE-2014-8989.
>>>>>
>>>>> Cc: [email protected]
>>>>> Signed-off-by: "Eric W. Biederman" <[email protected]>
>>>>> ---
>>>>
>>>>>
>>>>> +static inline bool gid_mapping_possible(const struct user_namespace *ns)
>>>>> +{
>>>>> + return ns->gid_map.nr_extents != 0;
>>>>> +}
>>>>> +
>>>>
>>>> Can you rename this to userns_may_setgroups or something like that?
>>>> To me, gid_mapping_possible sounds like you're allowed to map gids,
>>>> which sounds like the opposite condition, and it doesn't explain what
>>>> the point is.
>>>
>>> gid_mapping_established?
>>>
>>> What I mean to be testing if is if from_kgid and make_kgid will work
>>> because the gid mappings have been set.
>>
>> But why do you care whether from_kgid and make_kgid will work? If
>> they fail, then they fail. I think that the point is that you're
>> checking whether allowing setgroups to drop groups is safe, and that's
>> only barely the same condition.
>
> For all of the system calls to set or change uids and gids except
> setgroups it happens to fall out that if there are no mappings set the
> system calls fail. That is and was deliberate. However setgroups is
> weird because it allows the case of 0 mappings and to maintain the
> constraint that it fails when there are no mapping set (just like
> everything else) that requires an additional test.
>
>>> The userns knob for setgroups is a different test and is added
>>> in the next patch. And yes we really need both or the knob can
>>> start out as on, and we need to provent setgroups(0, NULL)
>>> before the user namespace is unshared.
>>
>> Do you mean before it's mapped?
>
> Right we need to prevent setgroups(0, NULL) before we set the gid
> mapping.

Fair enough.

If you factor this into a separate inline helper, it might be worth
adding a short comment to that effect. It could be as simple as:

static inline bool whatever(whatever) {
if (mapping is empty)
return false; /* setgroups with a nonempty set requires a
mapping; make sure that setgroups(0, NULL) does, too. */
...;
}

>
>>> Although come to think about it probably makes sense to roll those two
>>> test into one function and call that inline function from the setgroups
>>> implementation.
>>
>> That's what I think, too.
>>
>>>
>>> Anyway I will think about it and see what I can do to make it easily
>>> comprehensible.
>>>
>>>>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>>>>> index aa312b0dc3ec..51d65b444951 100644
>>>>> --- a/kernel/user_namespace.c
>>>>> +++ b/kernel/user_namespace.c
>>>>> @@ -812,16 +812,19 @@ static bool new_idmap_permitted(const struct file *file,
>>>>> struct user_namespace *ns, int cap_setid,
>>>>> struct uid_gid_map *new_map)
>>>>> {
>>>>> - /* Allow mapping to your own filesystem ids */
>>>>> - if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
>>>>> + const struct cred *cred = file->f_cred;
>>>>> +
>>>>> + /* Allow a mapping without capabilities when allowing the root
>>>>> + * of the user namespace capabilities restricted to that id
>>>>> + * will not change the set of credentials available to that
>>>>> + * user.
>>>>> + */
>>>>> + if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) &&
>>>>> + uid_eq(ns->owner, cred->euid)) {
>>>>
>>>> What's uid_eq(ns->owner, cred->euid)) for? This should already be covered by:
>>>
>>> This means that the only user we attempt to set up unprivileged mappings
>>> for is the owner of the user namespace. Anyone else should already
>>> have capabilities in the parent user namespace or shouldn't be able to
>>> set the mapping at all.
>>>
>>> In practice it is a clarification to make it easier to think about the code.
>>
>> But why? I don't see why this check is necessary or why it's relevant
>> to the current issue.
>
> My goal in this check is to guarantee that any combination of uids and
> gids in struct cred that you can obtain with mappings and a user
> namespace you can also obtain without privilege without a user
> namespace.
>
> What limiting euid to ns->owner does is it guarantees that when a user
> namespace is created without privilege root doesn't come along and set
> the mapping using the unprivileged path. That is confusing to think
> about and it is not necessary to support.
>
> With ns->owner == euid I have the guarantee especially with the gids
> that they wind up paired with a uid in struct cred that came from the
> same user. Either that or someone set one of the mappings with
> privilege.
>
> With ns->owner == euid I can verify all of these things pretty
> trivially. Without that check I don't have a clue how to verify
> the pairing between uids and gids in the unprivileged mapping.
>
> Does that make things clearer?

Yes. Thanks. It might pay to try to improve the comment. I
understand it with this explanation but I didn't when I just read the
comment.

>
>>>> if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN))
>>>> goto out;
>>>>
>>>> (except that I don't know why cap_valid(cap_setid) is checked -- this
>>>> ought to be enforced for projid_map, too, right?)
>>>
>>> What to do with projid_map is entirely different discussion. In
>>> practice it is dead, and either XFS needs to be fixed to use it
>>> or that code needs to be removed. At the time I wrote it XFS
>>> did not require any privileges to set project ids.
>>>
>>>>> u32 id = new_map->extent[0].lower_first;
>>>>> if (cap_setid == CAP_SETUID) {
>>>>> kuid_t uid = make_kuid(ns->parent, id);
>>>>> - if (uid_eq(uid, file->f_cred->fsuid))
>>>>> - return true;
>>>>> - } else if (cap_setid == CAP_SETGID) {
>>>>> - kgid_t gid = make_kgid(ns->parent, id);
>>>>> - if (gid_eq(gid, file->f_cred->fsgid))
>>>>> + if (uid_eq(uid, cred->euid))
>>>>
>>>> Why'd you change this from fsuid to euid?
>>>
>>> Because strangely enough I can set euid to any other uid with
>>> setresuid, but the same does not hold with fsuid.
>>>
>>> So strictly speaking fsuid was actually wrong before. In practice
>>> fsuid == euid so I don't think anyone will care. But I want very much
>>> to enforce the rule that user namespaces can't give you any credentials
>>> you couldn't get otherwise.
>>
>> Fair enough. Want to split that into a separate patch, then?
>
> Strictly speaking it is part and parcel of the same thing but it
> probably makes sense to split it out and emphasise and explain the
> change.

Sounds good.

Anyway, time to do a combination of Real Work (tm) and dealing with
the fact that I found a whole family of vulnerabilities of
as-yet-unknown severity in arch/x86 this morning.

--Andy

2014-12-02 23:09:39

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis

Andy Lutomirski <[email protected]> writes:

> On Tue, Dec 2, 2014 at 1:45 PM, Eric W. Biederman <[email protected]> wrote:
>> Andy Lutomirski <[email protected]> writes:
>>
>>> On Tue, Dec 2, 2014 at 12:28 PM, Eric W. Biederman
>>> <[email protected]> wrote:
>>>>
>>>> - Expose the knob to user space through a proc file /proc/<pid>/setgroups
>>>
>>> Can you rename this to something clearer, e.g. userns_setgroups_mode?
>>
>> I am not certain that is any clearer. That just reads as wordier.
>>
>> The userns bit is definitely confusing and wrong. Why should we need to
>> spell out the scope?
>
> Because it's a property of the process' userns, not a property of the process.
>
> userns_setgroups would be okay. (Arguably it should've been
> userns_uid_map, too, but at least uid_map sounds obviously
> namespace-related.)
>
>>
>>>> A value of 0 means the setgroups system call is disabled in the
>>>> current processes user namespace and can not be enabled in the
>>>> future in this user namespace.
>>>>
>>>> A value of 1 means the segtoups system call is enabled.
>>>
>>> Would it make more sense to put strings like "allow" and "deny" in
>>> here? That way, future extensions could add additional values.
>>
>> If the implementation of the write side isn't too bad. I would love
>> to see precedent elsewhere in the kernel. What I don't expect to do
>> is have any values except setgroups are enabled and setgroups are
>> disabled.
>
> current_clocksource? I think there are lots of things like that.
>
>>
>> I am fine with allowing for the possibility (that is just good
>> engineering) but I don't intend to seriously considering or
>> implementing other possibilities.
>
> I can imagine a mode where there's a fixed set of groups that are
> forced set but other groups can be added and dropped. It would be
> complicated to set up right, but someone might want it some day.

Yeah. I am defintiely not designing for that.

>>>> diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
>>>> index 21c91feeca2d..6d0ee1b089fb 100644
>>>> --- a/arch/s390/kernel/compat_linux.c
>>>> +++ b/arch/s390/kernel/compat_linux.c
>>>> @@ -252,6 +252,7 @@ COMPAT_SYSCALL_DEFINE2(s390_setgroups16, int, gidsetsize, u16 __user *, grouplis
>>>> int retval;
>>>>
>>>> if (!gid_mapping_possible(user_ns) ||
>>>> + !atomic_read(&user_ns->setgroups_allowed) ||
>>>> !capable(CAP_SETGID))
>>>> return -EPERM;
>>>
>>> This is now incomprehensible because of the gid_mapping_possible
>>> thing. If you renamed gid_mapping_possible to
>>> userns_setgroup_allowed, then this could be added to the
>>> implementation, and this would all make sense (not to mention avoiding
>>> duplicating this thing).
>>>
>>>> @@ -826,6 +827,11 @@ static bool new_idmap_permitted(const struct file *file,
>>>> kuid_t uid = make_kuid(ns->parent, id);
>>>> if (uid_eq(uid, cred->euid))
>>>> return true;
>>>> + } else if (cap_setid == CAP_SETGID) {
>>>> + kgid_t gid = make_kgid(ns->parent, id);
>>>> + if (!atomic_read(&ns->setgroups_allowed) &&
>>>> + gid_eq(gid, cred->egid))
>>>> + return true;
>>>
>>> I still don't see why egid is any better than fsgid here.
>>
>> Answered in my earlier response fsgid was a goof.
>> I can set any gid to my egid with my existing permissions.
>> Show me how I can do that with fsgid or fsuid and I will be happy to use
>> those.
>
> You can use your fsgid to make a setgid file, I think. But yes, point
> taken, although as mentioned in the other thread, I think it would be
> a lot clearer if it were a separate patch.

Agreed.

>>>> +ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
>>>> + size_t count, loff_t *ppos)
>>>> +{
>>>> + struct seq_file *seq = file->private_data;
>>>> + struct user_namespace *ns = seq->private;
>>>> + char kbuf[3];
>>>> + int setgroups_allowed;
>>>> + ssize_t ret;
>>>> +
>>>> + ret = -EPERM;
>>>> + if (!file_ns_capable(file, ns, CAP_SETGID))
>>>> + goto out;
>>>
>>> CAP_SYS_ADMIN? This isn't setting a gid in the namespace; it's
>>> reconfiguring the namespace.
>>
>> Hmm. Maybe. It is an activity that is normally controlled by
>> CAP_SETGID.
>>
>> Frankly I think the entire split up of the capability model is almost
>> totally broken. But I think CAP_SETGID is a close approximation of the
>> right thing here.
>
> I agree that the cap model is screwy. But we use CAP_SYS_ADMIN for
> everything else that changes the overall behavior of a namespace.
>
> In any event, the only way it matters is for a non-ns owner in the
> parent ns with CAP_SETGID set but not CAP_SYS_ADMIN. I'd argue that
> CAP_SETGID should not be usable to make unrelated process' syscalls
> fail.

That is a pretty decent argument. I will take a good stare at this
issue as I refresh these patches and see how close to perfection I can
make them.

>>>> + /* Only allow a very narrow range of strings to be written */
>>>> + ret = -EINVAL;
>>>> + if ((*ppos != 0) || (count >= sizeof(kbuf)) || (count < 1))
>>>> + goto out;
>>>> +
>>>> + /* What was written? */
>>>> + ret = -EFAULT;
>>>> + if (copy_from_user(kbuf, buf, count))
>>>> + goto out;
>>>> + kbuf[count] = '\0';
>>>> +
>>>> + /* What is being requested? */
>>>> + ret = -EINVAL;
>>>> + if (kbuf[0] == '0')
>>>> + setgroups_allowed = 0;
>>>> + else if (kbuf[0] == '1')
>>>> + setgroups_allowed = 1;
>>>> + else
>>>> + goto out;
>>>> +
>>>> + /* Allow a trailing newline */
>>>> + ret = -EINVAL;
>>>> + if ((count == 2) && (kbuf[1] != '\n'))
>>>> + goto out;
>>>> +
>>>> +
>>>> + if (setgroups_allowed) {
>>>> + ret = -EINVAL;
>>>> + if (atomic_read(&ns->setgroups_allowed) == 0)
>>>> + goto out;
>>>> + } else {
>>>
>>> I would disallow this if gid_map has been written in the interest of
>>> sanity.
>>
>> Not a chance. That is part of making this an independent knob. If
>> there is another reason for disabling setgroups you can flip this
>> knob even after mappings are established.
>
> Then you really want CAP_SYS_ADMIN, I think.
>
>>
>>>> + atomic_set(&ns->setgroups_allowed, 0);
>>>> + /* sigh memory barriers! */
>>>
>>> I don't think that any barriers are needed. If you ever observe
>>> setgroups_allowed == 0, it will stay that way forever.
>>
>> Likely. In practice the code works today.
>>
>> But I need to review things closely to understand if there are barriers
>> needed. But especially since it is a write once knob we can get away
>> with a lot.
>>
>
> Yeah.
>
> For long-term use, I kind of like the flags approach I added in the
> other patch. It makes it easy to add more flags in the future.

I thought a dedicated word might be easier. I don't know that it makes
much of a difference at this point.

> In any event, I think the only barrier that's needed is when writing gid_map:
>
> atomic_read / test_bit to determine whether unpriv mappings are okay;
> smp_mb() or whatever the current _after_atomic thing is these days;
> write mapping;
>
> Although I'm not sure whether Linux supports any architectures that
> can violate causality in the way that barrier is there to prevent.

The DEC Alpha at least used to be able to violate causality in ways
weird enough to confuse anyone, and the alpha still seems to be in the
tree. What keyed me in was the part in atomic_ops.txt that says these
things don't have barriers and you will probably need them, and I
remember spin locks tend to take care of all those issues for you.

Anyway thanks for your barrier pointer.

Eric

2014-12-02 23:17:58

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis

On Tue, Dec 2, 2014 at 3:07 PM, Eric W. Biederman <[email protected]> wrote:
> Andy Lutomirski <[email protected]> writes:
>
>> On Tue, Dec 2, 2014 at 1:45 PM, Eric W. Biederman <[email protected]> wrote:
>>> Andy Lutomirski <[email protected]> writes:
>>>
>>>> On Tue, Dec 2, 2014 at 12:28 PM, Eric W. Biederman
>>>> <[email protected]> wrote:
>>>>>
>>>>> - Expose the knob to user space through a proc file /proc/<pid>/setgroups
>>>>
>>>> Can you rename this to something clearer, e.g. userns_setgroups_mode?
>>>
>>> I am not certain that is any clearer. That just reads as wordier.
>>>
>>> The userns bit is definitely confusing and wrong. Why should we need to
>>> spell out the scope?
>>
>> Because it's a property of the process' userns, not a property of the process.
>>
>> userns_setgroups would be okay. (Arguably it should've been
>> userns_uid_map, too, but at least uid_map sounds obviously
>> namespace-related.)
>>
>>>
>>>>> A value of 0 means the setgroups system call is disabled in the
>>>>> current processes user namespace and can not be enabled in the
>>>>> future in this user namespace.
>>>>>
>>>>> A value of 1 means the segtoups system call is enabled.
>>>>
>>>> Would it make more sense to put strings like "allow" and "deny" in
>>>> here? That way, future extensions could add additional values.
>>>
>>> If the implementation of the write side isn't too bad. I would love
>>> to see precedent elsewhere in the kernel. What I don't expect to do
>>> is have any values except setgroups are enabled and setgroups are
>>> disabled.
>>
>> current_clocksource? I think there are lots of things like that.
>>
>>>
>>> I am fine with allowing for the possibility (that is just good
>>> engineering) but I don't intend to seriously considering or
>>> implementing other possibilities.
>>
>> I can imagine a mode where there's a fixed set of groups that are
>> forced set but other groups can be added and dropped. It would be
>> complicated to set up right, but someone might want it some day.
>
> Yeah. I am defintiely not designing for that.
>
>>>>> diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
>>>>> index 21c91feeca2d..6d0ee1b089fb 100644
>>>>> --- a/arch/s390/kernel/compat_linux.c
>>>>> +++ b/arch/s390/kernel/compat_linux.c
>>>>> @@ -252,6 +252,7 @@ COMPAT_SYSCALL_DEFINE2(s390_setgroups16, int, gidsetsize, u16 __user *, grouplis
>>>>> int retval;
>>>>>
>>>>> if (!gid_mapping_possible(user_ns) ||
>>>>> + !atomic_read(&user_ns->setgroups_allowed) ||
>>>>> !capable(CAP_SETGID))
>>>>> return -EPERM;
>>>>
>>>> This is now incomprehensible because of the gid_mapping_possible
>>>> thing. If you renamed gid_mapping_possible to
>>>> userns_setgroup_allowed, then this could be added to the
>>>> implementation, and this would all make sense (not to mention avoiding
>>>> duplicating this thing).
>>>>
>>>>> @@ -826,6 +827,11 @@ static bool new_idmap_permitted(const struct file *file,
>>>>> kuid_t uid = make_kuid(ns->parent, id);
>>>>> if (uid_eq(uid, cred->euid))
>>>>> return true;
>>>>> + } else if (cap_setid == CAP_SETGID) {
>>>>> + kgid_t gid = make_kgid(ns->parent, id);
>>>>> + if (!atomic_read(&ns->setgroups_allowed) &&
>>>>> + gid_eq(gid, cred->egid))
>>>>> + return true;
>>>>
>>>> I still don't see why egid is any better than fsgid here.
>>>
>>> Answered in my earlier response fsgid was a goof.
>>> I can set any gid to my egid with my existing permissions.
>>> Show me how I can do that with fsgid or fsuid and I will be happy to use
>>> those.
>>
>> You can use your fsgid to make a setgid file, I think. But yes, point
>> taken, although as mentioned in the other thread, I think it would be
>> a lot clearer if it were a separate patch.
>
> Agreed.
>
>>>>> +ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
>>>>> + size_t count, loff_t *ppos)
>>>>> +{
>>>>> + struct seq_file *seq = file->private_data;
>>>>> + struct user_namespace *ns = seq->private;
>>>>> + char kbuf[3];
>>>>> + int setgroups_allowed;
>>>>> + ssize_t ret;
>>>>> +
>>>>> + ret = -EPERM;
>>>>> + if (!file_ns_capable(file, ns, CAP_SETGID))
>>>>> + goto out;
>>>>
>>>> CAP_SYS_ADMIN? This isn't setting a gid in the namespace; it's
>>>> reconfiguring the namespace.
>>>
>>> Hmm. Maybe. It is an activity that is normally controlled by
>>> CAP_SETGID.
>>>
>>> Frankly I think the entire split up of the capability model is almost
>>> totally broken. But I think CAP_SETGID is a close approximation of the
>>> right thing here.
>>
>> I agree that the cap model is screwy. But we use CAP_SYS_ADMIN for
>> everything else that changes the overall behavior of a namespace.
>>
>> In any event, the only way it matters is for a non-ns owner in the
>> parent ns with CAP_SETGID set but not CAP_SYS_ADMIN. I'd argue that
>> CAP_SETGID should not be usable to make unrelated process' syscalls
>> fail.
>
> That is a pretty decent argument. I will take a good stare at this
> issue as I refresh these patches and see how close to perfection I can
> make them.
>
>>>>> + /* Only allow a very narrow range of strings to be written */
>>>>> + ret = -EINVAL;
>>>>> + if ((*ppos != 0) || (count >= sizeof(kbuf)) || (count < 1))
>>>>> + goto out;
>>>>> +
>>>>> + /* What was written? */
>>>>> + ret = -EFAULT;
>>>>> + if (copy_from_user(kbuf, buf, count))
>>>>> + goto out;
>>>>> + kbuf[count] = '\0';
>>>>> +
>>>>> + /* What is being requested? */
>>>>> + ret = -EINVAL;
>>>>> + if (kbuf[0] == '0')
>>>>> + setgroups_allowed = 0;
>>>>> + else if (kbuf[0] == '1')
>>>>> + setgroups_allowed = 1;
>>>>> + else
>>>>> + goto out;
>>>>> +
>>>>> + /* Allow a trailing newline */
>>>>> + ret = -EINVAL;
>>>>> + if ((count == 2) && (kbuf[1] != '\n'))
>>>>> + goto out;
>>>>> +
>>>>> +
>>>>> + if (setgroups_allowed) {
>>>>> + ret = -EINVAL;
>>>>> + if (atomic_read(&ns->setgroups_allowed) == 0)
>>>>> + goto out;
>>>>> + } else {
>>>>
>>>> I would disallow this if gid_map has been written in the interest of
>>>> sanity.
>>>
>>> Not a chance. That is part of making this an independent knob. If
>>> there is another reason for disabling setgroups you can flip this
>>> knob even after mappings are established.
>>
>> Then you really want CAP_SYS_ADMIN, I think.
>>
>>>
>>>>> + atomic_set(&ns->setgroups_allowed, 0);
>>>>> + /* sigh memory barriers! */
>>>>
>>>> I don't think that any barriers are needed. If you ever observe
>>>> setgroups_allowed == 0, it will stay that way forever.
>>>
>>> Likely. In practice the code works today.
>>>
>>> But I need to review things closely to understand if there are barriers
>>> needed. But especially since it is a write once knob we can get away
>>> with a lot.
>>>
>>
>> Yeah.
>>
>> For long-term use, I kind of like the flags approach I added in the
>> other patch. It makes it easy to add more flags in the future.
>
> I thought a dedicated word might be easier. I don't know that it makes
> much of a difference at this point.
>
>> In any event, I think the only barrier that's needed is when writing gid_map:
>>
>> atomic_read / test_bit to determine whether unpriv mappings are okay;
>> smp_mb() or whatever the current _after_atomic thing is these days;
>> write mapping;
>>
>> Although I'm not sure whether Linux supports any architectures that
>> can violate causality in the way that barrier is there to prevent.
>
> The DEC Alpha at least used to be able to violate causality in ways
> weird enough to confuse anyone, and the alpha still seems to be in the
> tree. What keyed me in was the part in atomic_ops.txt that says these
> things don't have barriers and you will probably need them, and I
> remember spin locks tend to take care of all those issues for you.
>
> Anyway thanks for your barrier pointer.
>

My pointer was a bit off. Barriers synchronize with barriers; the
check for whether setgroups is okay may need a barrier as well:

static inline bool may_setgroups(whatever) {
if (no groups are mapped)
return false;
smp_rmb() (or _before_atomic, maybe -- I don't know whether that's
okay for atomic_read);
if (setgroups is disallowed by option)
return false;

return true;
}

It would be bad if there were a window in which we'd see a group
mapped but not see that setgroups is disallowed.

--Andy

2014-12-08 22:08:28

by Eric W. Biederman

[permalink] [raw]
Subject: [CFT][PATCH 1/7] userns: Document what the invariant required for safe unprivileged mappings.


The rule is simple. Don't allow anything that wouldn't be allowed
without unprivileged mappings.

It was previously overlooked that establishing gid mappings would
allow dropping groups and potentially gaining permission to files and
directories that had lesser permissions for a specific group than for
all other users.

This is the rule needed to fix CVE-2014-8989 and prevent any other
security issues with new_idmap_permitted.

The reason for this rule is that the unix permission model is old and
there are programs out there somewhere that take advantage of every
little corner of it. So allowing a uid or gid mapping to be
established without privielge that would allow anything that would not
be allowed without that mapping will result in expectations from some
code somewhere being violated. Violated expectations about the
behavior of the OS is a long way to say a security issue.

Cc: [email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/user_namespace.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index aa312b0dc3ec..b99c862a2e3f 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -812,7 +812,9 @@ static bool new_idmap_permitted(const struct file *file,
struct user_namespace *ns, int cap_setid,
struct uid_gid_map *new_map)
{
- /* Allow mapping to your own filesystem ids */
+ /* Don't allow mappings that would allow anything that wouldn't
+ * be allowed without the establishment of unprivileged mappings.
+ */
if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
u32 id = new_map->extent[0].lower_first;
if (cap_setid == CAP_SETUID) {
--
1.9.1

2014-12-08 22:09:24

by Eric W. Biederman

[permalink] [raw]
Subject: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished


setgroups is unique in not needing a valid mapping before it can be called,
in the case of setgroups(0, NULL) which drops all supplemental groups.

The design of the user namespace assumes that CAP_SETGID can not actually
be used until a gid mapping is established. Therefore add a helper function
to see if the user namespace gid mapping has been established and call
that function in the setgroups permission check.

This is part of the fix for CVE-2014-8989, being able to drop groups
without privilege using user namespaces.

Cc: [email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
include/linux/user_namespace.h | 9 +++++++++
kernel/groups.c | 7 ++++++-
2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index e95372654f09..41cc26e5a350 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -37,6 +37,15 @@ struct user_namespace {

extern struct user_namespace init_user_ns;

+static inline bool userns_gid_mappings_established(const struct user_namespace *ns)
+{
+ bool established;
+ smp_mb__before_atomic();
+ established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0;
+ smp_mb__after_atomic();
+ return established;
+}
+
#ifdef CONFIG_USER_NS

static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
diff --git a/kernel/groups.c b/kernel/groups.c
index 02d8a251c476..e0335e44f76a 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -6,6 +6,7 @@
#include <linux/slab.h>
#include <linux/security.h>
#include <linux/syscalls.h>
+#include <linux/user_namespace.h>
#include <asm/uaccess.h>

/* init to 2 - one for init_task, one to ensure it is never freed */
@@ -217,7 +218,11 @@ bool may_setgroups(void)
{
struct user_namespace *user_ns = current_user_ns();

- return ns_capable(user_ns, CAP_SETGID);
+ /* It is not safe to use setgroups until a gid mapping in
+ * the user namespace has been established.
+ */
+ return userns_gid_mappings_established(user_ns) &&
+ ns_capable(user_ns, CAP_SETGID);
}

/*
--
1.9.1

2014-12-08 22:10:01

by Eric W. Biederman

[permalink] [raw]
Subject: [CFT][PATCH 3/7] userns: Don't allow unprivileged creation of gid mappings


As any gid mapping will allow and must allow for backwards
compatibility dropping groups don't allow any gid mappings to be
established without CAP_SETGID in the parent user namespace.

For a small class of applications this change breaks userspace
and removes useful functionality. This small class of applications
includes tools/testing/selftests/mount/unprivilged-remount-test.c

Most of the removed functionality will be added back with the addition
of a one way knob to disable setgroups. Once setgroups is disabled
setting the gid_map becomes as safe as setting the uid_map.

For more common applications that set the uid_map and the gid_map
with privilege this change will have no affect.

This is part of a fix for CVE-2014-8989.

Cc: [email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/user_namespace.c | 4 ----
1 file changed, 4 deletions(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index b99c862a2e3f..8e7c87162171 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -821,10 +821,6 @@ static bool new_idmap_permitted(const struct file *file,
kuid_t uid = make_kuid(ns->parent, id);
if (uid_eq(uid, file->f_cred->fsuid))
return true;
- } else if (cap_setid == CAP_SETGID) {
- kgid_t gid = make_kgid(ns->parent, id);
- if (gid_eq(gid, file->f_cred->fsgid))
- return true;
}
}

--
1.9.1

2014-12-08 22:10:52

by Eric W. Biederman

[permalink] [raw]
Subject: [CFT][PATCH 4/7] userns: Check euid no fsuid when establishing an unprivileged uid mapping


setresuid allows the euid to be set to any of uid, euid, suid, and
fsuid. Therefor it is safe to allow an unprivileged user to map their
euid and use CAP_SETUID privileged with exactly that uid, as no new
credentials can be obtained.

I can not find a combination of existing system calls that allows
setting uid, euid, suid, and fsuid from the fsuid making the previous
use of fsuid for allowing unprivileged mappings a bug.

This is part of a fix for CVE-2014-8989.

Cc: [email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/user_namespace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 8e7c87162171..da1eeb927b21 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -819,7 +819,7 @@ static bool new_idmap_permitted(const struct file *file,
u32 id = new_map->extent[0].lower_first;
if (cap_setid == CAP_SETUID) {
kuid_t uid = make_kuid(ns->parent, id);
- if (uid_eq(uid, file->f_cred->fsuid))
+ if (uid_eq(uid, file->f_cred->euid))
return true;
}
}
--
1.9.1

2014-12-08 22:11:53

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished

On Mon, Dec 8, 2014 at 2:07 PM, Eric W. Biederman <[email protected]> wrote:
>
> setgroups is unique in not needing a valid mapping before it can be called,
> in the case of setgroups(0, NULL) which drops all supplemental groups.
>
> The design of the user namespace assumes that CAP_SETGID can not actually
> be used until a gid mapping is established. Therefore add a helper function
> to see if the user namespace gid mapping has been established and call
> that function in the setgroups permission check.
>
> This is part of the fix for CVE-2014-8989, being able to drop groups
> without privilege using user namespaces.
>
> Cc: [email protected]
> Signed-off-by: "Eric W. Biederman" <[email protected]>
> ---
> include/linux/user_namespace.h | 9 +++++++++
> kernel/groups.c | 7 ++++++-
> 2 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index e95372654f09..41cc26e5a350 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -37,6 +37,15 @@ struct user_namespace {
>
> extern struct user_namespace init_user_ns;
>
> +static inline bool userns_gid_mappings_established(const struct user_namespace *ns)
> +{
> + bool established;
> + smp_mb__before_atomic();
> + established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0;
> + smp_mb__after_atomic();
> + return established;
> +}

I don't think this works on all platforms. ACCESS_ONCE is not atomic
in the smp_mb__before_atomic sense.

> +
> #ifdef CONFIG_USER_NS
>
> static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
> diff --git a/kernel/groups.c b/kernel/groups.c
> index 02d8a251c476..e0335e44f76a 100644
> --- a/kernel/groups.c
> +++ b/kernel/groups.c
> @@ -6,6 +6,7 @@
> #include <linux/slab.h>
> #include <linux/security.h>
> #include <linux/syscalls.h>
> +#include <linux/user_namespace.h>
> #include <asm/uaccess.h>
>
> /* init to 2 - one for init_task, one to ensure it is never freed */
> @@ -217,7 +218,11 @@ bool may_setgroups(void)
> {
> struct user_namespace *user_ns = current_user_ns();
>
> - return ns_capable(user_ns, CAP_SETGID);
> + /* It is not safe to use setgroups until a gid mapping in
> + * the user namespace has been established.
> + */
> + return userns_gid_mappings_established(user_ns) &&
> + ns_capable(user_ns, CAP_SETGID);
> }
>
> /*
> --
> 1.9.1
>

--Andy

--
Andy Lutomirski
AMA Capital Management, LLC

2014-12-08 22:12:41

by Eric W. Biederman

[permalink] [raw]
Subject: [CFT][PATCH 5/7] userns: Only allow the creator of the userns unprivileged mappings


If you did not create the user namespace and are allowed
to write to uid_map or gid_map you should already have the necessary
privilege in the parent user namespace to establish any mapping
you want so this will not affect userspace in practice.

Limiting unprivileged uid mapping establishment to the creator of the
user namespace reduces the set of credentials that must be verified
can be obtained without privielge, making code verification simpler.

Limiting unprivileged gid mapping establishment (which is temporarily
absent) to the creator of the user namespace also ensures that the
combination of uid and gid can already be obtained without privilege.

This is part of the fix for CVE-2014-8989.

Cc: [email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/user_namespace.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index da1eeb927b21..413f60fd5983 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -812,14 +812,16 @@ static bool new_idmap_permitted(const struct file *file,
struct user_namespace *ns, int cap_setid,
struct uid_gid_map *new_map)
{
+ const struct cred *cred = file->f_cred;
/* Don't allow mappings that would allow anything that wouldn't
* be allowed without the establishment of unprivileged mappings.
*/
- if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
+ if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) &&
+ uid_eq(ns->owner, cred->euid)) {
u32 id = new_map->extent[0].lower_first;
if (cap_setid == CAP_SETUID) {
kuid_t uid = make_kuid(ns->parent, id);
- if (uid_eq(uid, file->f_cred->euid))
+ if (uid_eq(uid, cred->euid))
return true;
}
}
--
1.9.1

2014-12-08 22:12:59

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [CFT][PATCH 4/7] userns: Check euid no fsuid when establishing an unprivileged uid mapping

On Mon, Dec 8, 2014 at 2:08 PM, Eric W. Biederman <[email protected]> wrote:
>
> setresuid allows the euid to be set to any of uid, euid, suid, and
> fsuid. Therefor it is safe to allow an unprivileged user to map their
> euid and use CAP_SETUID privileged with exactly that uid, as no new
> credentials can be obtained.
>
> I can not find a combination of existing system calls that allows
> setting uid, euid, suid, and fsuid from the fsuid making the previous
> use of fsuid for allowing unprivileged mappings a bug.

Right.

>
> This is part of a fix for CVE-2014-8989.

Reviewed-by: Andy Lutomirski <[email protected]>

>
> Cc: [email protected]
> Signed-off-by: "Eric W. Biederman" <[email protected]>
> ---
> kernel/user_namespace.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 8e7c87162171..da1eeb927b21 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -819,7 +819,7 @@ static bool new_idmap_permitted(const struct file *file,
> u32 id = new_map->extent[0].lower_first;
> if (cap_setid == CAP_SETUID) {
> kuid_t uid = make_kuid(ns->parent, id);
> - if (uid_eq(uid, file->f_cred->fsuid))
> + if (uid_eq(uid, file->f_cred->euid))
> return true;
> }
> }
> --
> 1.9.1
>



--
Andy Lutomirski
AMA Capital Management, LLC

2014-12-08 22:13:37

by Eric W. Biederman

[permalink] [raw]
Subject: [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis


- Expose the knob to user space through a proc file /proc/<pid>/setgroups

A value of 0 means the setgroups system call is disabled in the
current processes user namespace and can not be enabled in the
future in this user namespace.

A value of 1 means the segtoups system call is enabled.

- Descedent user namespaces inherit the value of setgroups from
their parents.

- A proc file is used (instead of a sysctl) as sysctls
currently do not pass in a struct file so file_ns_capable
is unusable.

Cc: [email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
fs/proc/base.c | 31 ++++++++++----
include/linux/user_namespace.h | 25 +++++++++++
kernel/groups.c | 1 +
kernel/user.c | 1 +
kernel/user_namespace.c | 97 ++++++++++++++++++++++++++++++++++++++++++
5 files changed, 147 insertions(+), 8 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 772efa45a452..4ebed9f01d97 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2386,7 +2386,7 @@ static int proc_tgid_io_accounting(struct seq_file *m, struct pid_namespace *ns,
#endif /* CONFIG_TASK_IO_ACCOUNTING */

#ifdef CONFIG_USER_NS
-static int proc_id_map_open(struct inode *inode, struct file *file,
+static int proc_userns_open(struct inode *inode, struct file *file,
const struct seq_operations *seq_ops)
{
struct user_namespace *ns = NULL;
@@ -2418,7 +2418,7 @@ err:
return ret;
}

-static int proc_id_map_release(struct inode *inode, struct file *file)
+static int proc_userns_release(struct inode *inode, struct file *file)
{
struct seq_file *seq = file->private_data;
struct user_namespace *ns = seq->private;
@@ -2428,17 +2428,17 @@ static int proc_id_map_release(struct inode *inode, struct file *file)

static int proc_uid_map_open(struct inode *inode, struct file *file)
{
- return proc_id_map_open(inode, file, &proc_uid_seq_operations);
+ return proc_userns_open(inode, file, &proc_uid_seq_operations);
}

static int proc_gid_map_open(struct inode *inode, struct file *file)
{
- return proc_id_map_open(inode, file, &proc_gid_seq_operations);
+ return proc_userns_open(inode, file, &proc_gid_seq_operations);
}

static int proc_projid_map_open(struct inode *inode, struct file *file)
{
- return proc_id_map_open(inode, file, &proc_projid_seq_operations);
+ return proc_userns_open(inode, file, &proc_projid_seq_operations);
}

static const struct file_operations proc_uid_map_operations = {
@@ -2446,7 +2446,7 @@ static const struct file_operations proc_uid_map_operations = {
.write = proc_uid_map_write,
.read = seq_read,
.llseek = seq_lseek,
- .release = proc_id_map_release,
+ .release = proc_userns_release,
};

static const struct file_operations proc_gid_map_operations = {
@@ -2454,7 +2454,7 @@ static const struct file_operations proc_gid_map_operations = {
.write = proc_gid_map_write,
.read = seq_read,
.llseek = seq_lseek,
- .release = proc_id_map_release,
+ .release = proc_userns_release,
};

static const struct file_operations proc_projid_map_operations = {
@@ -2462,7 +2462,20 @@ static const struct file_operations proc_projid_map_operations = {
.write = proc_projid_map_write,
.read = seq_read,
.llseek = seq_lseek,
- .release = proc_id_map_release,
+ .release = proc_userns_release,
+};
+
+static int proc_setgroups_open(struct inode *inode, struct file *file)
+{
+ return proc_userns_open(inode, file, &proc_setgroups_seq_operations);
+}
+
+static const struct file_operations proc_setgroups_operations = {
+ .open = proc_setgroups_open,
+ .write = proc_setgroups_write,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = proc_userns_release,
};
#endif /* CONFIG_USER_NS */

@@ -2572,6 +2585,7 @@ static const struct pid_entry tgid_base_stuff[] = {
REG("uid_map", S_IRUGO|S_IWUSR, proc_uid_map_operations),
REG("gid_map", S_IRUGO|S_IWUSR, proc_gid_map_operations),
REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations),
+ REG("setgroups", S_IRUGO|S_IWUSR, proc_setgroups_operations),
#endif
#ifdef CONFIG_CHECKPOINT_RESTORE
REG("timers", S_IRUGO, proc_timers_operations),
@@ -2913,6 +2927,7 @@ static const struct pid_entry tid_base_stuff[] = {
REG("uid_map", S_IRUGO|S_IWUSR, proc_uid_map_operations),
REG("gid_map", S_IRUGO|S_IWUSR, proc_gid_map_operations),
REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations),
+ REG("setgroups", S_IRUGO|S_IWUSR, proc_setgroups_operations),
#endif
};

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 41cc26e5a350..6451c401dcf6 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -17,6 +17,12 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
} extent[UID_GID_MAP_MAX_EXTENTS];
};

+enum user_namespace_flags {
+ USERNS_SETGROUPS_ALLOWED,
+};
+
+#define USERNS_INIT_FLAGS BIT(USERNS_SETGROUPS_ALLOWED)
+
struct user_namespace {
struct uid_gid_map uid_map;
struct uid_gid_map gid_map;
@@ -27,6 +33,7 @@ struct user_namespace {
kuid_t owner;
kgid_t group;
unsigned int proc_inum;
+ unsigned long flags;

/* Register of per-UID persistent keyrings for this namespace */
#ifdef CONFIG_PERSISTENT_KEYRINGS
@@ -46,6 +53,22 @@ static inline bool userns_gid_mappings_established(const struct user_namespace *
return established;
}

+static inline bool userns_setgroups_allowed(const struct user_namespace *ns)
+{
+ bool allowed;
+ smp_mb__before_atomic();
+ allowed = test_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags);
+ smp_mb__after_atomic();
+ return allowed;
+}
+
+static inline void userns_disable_setgroups(struct user_namespace *ns)
+{
+ smp_mb__before_atomic();
+ clear_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags);
+ smp_mb__after_atomic();
+}
+
#ifdef CONFIG_USER_NS

static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
@@ -69,9 +92,11 @@ struct seq_operations;
extern const struct seq_operations proc_uid_seq_operations;
extern const struct seq_operations proc_gid_seq_operations;
extern const struct seq_operations proc_projid_seq_operations;
+extern const struct seq_operations proc_setgroups_seq_operations;
extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *);
extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *);
extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *);
+extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
#else

static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
diff --git a/kernel/groups.c b/kernel/groups.c
index e0335e44f76a..2f136fda7c4d 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -222,6 +222,7 @@ bool may_setgroups(void)
* the user namespace has been established.
*/
return userns_gid_mappings_established(user_ns) &&
+ userns_setgroups_allowed(user_ns) &&
ns_capable(user_ns, CAP_SETGID);
}

diff --git a/kernel/user.c b/kernel/user.c
index 4efa39350e44..2d09940c9632 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -51,6 +51,7 @@ struct user_namespace init_user_ns = {
.owner = GLOBAL_ROOT_UID,
.group = GLOBAL_ROOT_GID,
.proc_inum = PROC_USER_INIT_INO,
+ .flags = USERNS_INIT_FLAGS,
#ifdef CONFIG_PERSISTENT_KEYRINGS
.persistent_keyring_register_sem =
__RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem),
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 413f60fd5983..3d128f91ced3 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -98,6 +98,11 @@ int create_user_ns(struct cred *new)
ns->level = parent_ns->level + 1;
ns->owner = owner;
ns->group = group;
+ ns->flags = USERNS_INIT_FLAGS;
+
+ /* Copy USERNS_SETGROUPS_ALLOWED from the parent user namespace */
+ if (!userns_setgroups_allowed(parent_ns))
+ userns_disable_setgroups(ns);

set_cred_user_ns(new, ns);

@@ -841,6 +846,98 @@ static bool new_idmap_permitted(const struct file *file,
return false;
}

+static void *setgroups_m_start(struct seq_file *seq, loff_t *ppos)
+{
+ struct user_namespace *ns = seq->private;
+
+ return (*ppos == 0) ? ns : NULL;
+}
+
+static void *setgroups_m_next(struct seq_file *seq, void *v, loff_t *ppos)
+{
+ ++*ppos;
+ return NULL;
+}
+
+static void setgroups_m_stop(struct seq_file *seq, void *v)
+{
+}
+
+static int setgroups_m_show(struct seq_file *seq, void *v)
+{
+ struct user_namespace *ns = seq->private;
+
+ seq_printf(seq, "%s\n",
+ test_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags) ?
+ "allow" : "deny");
+ return 0;
+}
+
+const struct seq_operations proc_setgroups_seq_operations = {
+ .start = setgroups_m_start,
+ .stop = setgroups_m_stop,
+ .next = setgroups_m_next,
+ .show = setgroups_m_show,
+};
+
+ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct seq_file *seq = file->private_data;
+ struct user_namespace *ns = seq->private;
+ char kbuf[8], *pos;
+ bool setgroups_allowed;
+ ssize_t ret;
+
+ ret = -EACCES;
+ if (!file_ns_capable(file, ns, CAP_SYS_ADMIN))
+ goto out;
+
+ /* Only allow a very narrow range of strings to be written */
+ ret = -EINVAL;
+ if ((*ppos != 0) || (count >= sizeof(kbuf)))
+ goto out;
+
+ /* What was written? */
+ ret = -EFAULT;
+ if (copy_from_user(kbuf, buf, count))
+ goto out;
+ kbuf[count] = '\0';
+ pos = kbuf;
+
+ /* What is being requested? */
+ ret = -EINVAL;
+ if (strncmp(pos, "allow", 5) == 0) {
+ pos += 5;
+ setgroups_allowed = true;
+ }
+ else if (strncmp(pos, "deny", 4) == 0) {
+ pos += 4;
+ setgroups_allowed = false;
+ }
+ else
+ goto out;
+
+ /* Verify there is not trailing junk on the line */
+ pos = skip_spaces(pos);
+ if (*pos != '\0')
+ goto out;
+
+ if (setgroups_allowed) {
+ ret = -EPERM;
+ if (!userns_setgroups_allowed(ns))
+ goto out;
+ } else {
+ userns_disable_setgroups(ns);
+ }
+
+ /* Report a successful write */
+ *ppos = count;
+ ret = count;
+out:
+ return ret;
+}
+
static void *userns_get(struct task_struct *task)
{
struct user_namespace *user_ns;
--
1.9.1

2014-12-08 22:16:26

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [CFT][PATCH 5/7] userns: Only allow the creator of the userns unprivileged mappings

On Mon, Dec 8, 2014 at 2:10 PM, Eric W. Biederman <[email protected]> wrote:
>
> If you did not create the user namespace and are allowed
> to write to uid_map or gid_map you should already have the necessary
> privilege in the parent user namespace to establish any mapping
> you want so this will not affect userspace in practice.
>
> Limiting unprivileged uid mapping establishment to the creator of the
> user namespace reduces the set of credentials that must be verified
> can be obtained without privielge, making code verification simpler.
>

s/privielge/privilege/

But I still can't parse that sentence.

The code itself is:

Reviewed-by: Andy Lutomirski <[email protected]>

> Limiting unprivileged gid mapping establishment (which is temporarily
> absent) to the creator of the user namespace also ensures that the
> combination of uid and gid can already be obtained without privilege.
>
> This is part of the fix for CVE-2014-8989.
>
> Cc: [email protected]
> Signed-off-by: "Eric W. Biederman" <[email protected]>
> ---
> kernel/user_namespace.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index da1eeb927b21..413f60fd5983 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -812,14 +812,16 @@ static bool new_idmap_permitted(const struct file *file,
> struct user_namespace *ns, int cap_setid,
> struct uid_gid_map *new_map)
> {
> + const struct cred *cred = file->f_cred;
> /* Don't allow mappings that would allow anything that wouldn't
> * be allowed without the establishment of unprivileged mappings.
> */
> - if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
> + if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) &&
> + uid_eq(ns->owner, cred->euid)) {
> u32 id = new_map->extent[0].lower_first;
> if (cap_setid == CAP_SETUID) {
> kuid_t uid = make_kuid(ns->parent, id);
> - if (uid_eq(uid, file->f_cred->euid))
> + if (uid_eq(uid, cred->euid))
> return true;
> }
> }
> --
> 1.9.1
>



--
Andy Lutomirski
AMA Capital Management, LLC

2014-12-08 22:16:41

by Eric W. Biederman

[permalink] [raw]
Subject: [CFT][PATCH 7/7] userns: Allow setting gid_maps without privilege when setgroups is disabled


Now that setgroups can be disabled and not reenabled, setting gid_map
without privielge can now be enabled when setgroups is disabled.

This restores most of the functionality that was lost when unprivilege
setting of gid_map was removed. Applications that use this
functionality will need to check to see if they use setgroups or
init_groups, and if they don't they can be fixed by simply
disabling of setgroups before they run.

Cc: [email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/user_namespace.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 3d128f91ced3..459c7f647072 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -828,6 +828,11 @@ static bool new_idmap_permitted(const struct file *file,
kuid_t uid = make_kuid(ns->parent, id);
if (uid_eq(uid, cred->euid))
return true;
+ } else if (cap_setid == CAP_SETGID) {
+ kgid_t gid = make_kgid(ns->parent, id);
+ if (!userns_setgroups_allowed(ns) &&
+ gid_eq(gid, cred->egid))
+ return true;
}
}

--
1.9.1

2014-12-08 22:17:42

by Richard Weinberger

[permalink] [raw]
Subject: Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished

Am 08.12.2014 um 23:07 schrieb Eric W. Biederman:
>
> setgroups is unique in not needing a valid mapping before it can be called,
> in the case of setgroups(0, NULL) which drops all supplemental groups.
>
> The design of the user namespace assumes that CAP_SETGID can not actually
> be used until a gid mapping is established. Therefore add a helper function
> to see if the user namespace gid mapping has been established and call
> that function in the setgroups permission check.
>
> This is part of the fix for CVE-2014-8989, being able to drop groups
> without privilege using user namespaces.
>
> Cc: [email protected]
> Signed-off-by: "Eric W. Biederman" <[email protected]>
> ---
> include/linux/user_namespace.h | 9 +++++++++
> kernel/groups.c | 7 ++++++-
> 2 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index e95372654f09..41cc26e5a350 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -37,6 +37,15 @@ struct user_namespace {
>
> extern struct user_namespace init_user_ns;
>
> +static inline bool userns_gid_mappings_established(const struct user_namespace *ns)
> +{
> + bool established;
> + smp_mb__before_atomic();
> + established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0;
> + smp_mb__after_atomic();
> + return established;
> +}
> +

Maybe this is a stupid question, but why do we need all this magic
around established = ... ?
The purpose of this code is to check whether ns->gid_map.nr_extents != 0
in a lock-free manner?

Thanks,
//richard

2014-12-08 22:22:25

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis

On Mon, Dec 8, 2014 at 2:11 PM, Eric W. Biederman <[email protected]> wrote:
>
> - Expose the knob to user space through a proc file /proc/<pid>/setgroups
>
> A value of 0 means the setgroups system call is disabled in the

"deny"

> current processes user namespace and can not be enabled in the
> future in this user namespace.
>
> A value of 1 means the segtoups system call is enabled.
>

"allow"

> - Descedent user namespaces inherit the value of setgroups from

s/Descedent/Descendent/

> --- a/kernel/groups.c
> +++ b/kernel/groups.c
> @@ -222,6 +222,7 @@ bool may_setgroups(void)
> * the user namespace has been established.
> */
> return userns_gid_mappings_established(user_ns) &&
> + userns_setgroups_allowed(user_ns) &&
> ns_capable(user_ns, CAP_SETGID);
> }

Can you add a comment explaining the ordering? For example:

We need to check for a gid mapping before checking setgroups_allowed
because an unprivileged user can create a userns with setgroups
allowed, then disallow setgroups and add a mapping. If we check in
the opposite order, then we have a race: we could see that setgroups
is allowed before the user clears the bit and then see that there is a
gid mapping after the other thread is done.

--Andy


--
Andy Lutomirski
AMA Capital Management, LLC

2014-12-08 22:26:02

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished

On Mon, Dec 8, 2014 at 2:17 PM, Richard Weinberger <[email protected]> wrote:
> Am 08.12.2014 um 23:07 schrieb Eric W. Biederman:
>>
>> setgroups is unique in not needing a valid mapping before it can be called,
>> in the case of setgroups(0, NULL) which drops all supplemental groups.
>>
>> The design of the user namespace assumes that CAP_SETGID can not actually
>> be used until a gid mapping is established. Therefore add a helper function
>> to see if the user namespace gid mapping has been established and call
>> that function in the setgroups permission check.
>>
>> This is part of the fix for CVE-2014-8989, being able to drop groups
>> without privilege using user namespaces.
>>
>> Cc: [email protected]
>> Signed-off-by: "Eric W. Biederman" <[email protected]>
>> ---
>> include/linux/user_namespace.h | 9 +++++++++
>> kernel/groups.c | 7 ++++++-
>> 2 files changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>> index e95372654f09..41cc26e5a350 100644
>> --- a/include/linux/user_namespace.h
>> +++ b/include/linux/user_namespace.h
>> @@ -37,6 +37,15 @@ struct user_namespace {
>>
>> extern struct user_namespace init_user_ns;
>>
>> +static inline bool userns_gid_mappings_established(const struct user_namespace *ns)
>> +{
>> + bool established;
>> + smp_mb__before_atomic();
>> + established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0;
>> + smp_mb__after_atomic();
>> + return established;
>> +}
>> +
>
> Maybe this is a stupid question, but why do we need all this magic
> around established = ... ?
> The purpose of this code is to check whether ns->gid_map.nr_extents != 0
> in a lock-free manner?
>

See my other comment -- the ordering will matter at the end of the series.

It might be nicer to do this differently: in may_setgroups, do:

if (!userns_gid_mappings_established)
return false;

/* User code can start with setgroups allowed, disallow it, and then
add a mapping. We need to prevent a race that could cause this
function to return true. */
smp_rmb();

if (!userns_setgroups_allowed)
return false;

--Andy

> Thanks,
> //richard



--
Andy Lutomirski
AMA Capital Management, LLC

2014-12-08 22:27:10

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [CFT][PATCH 7/7] userns: Allow setting gid_maps without privilege when setgroups is disabled

On Mon, Dec 8, 2014 at 2:14 PM, Eric W. Biederman <[email protected]> wrote:
>
> Now that setgroups can be disabled and not reenabled, setting gid_map
> without privielge can now be enabled when setgroups is disabled.
>
> This restores most of the functionality that was lost when unprivilege

unprivileged.

> setting of gid_map was removed. Applications that use this
> functionality will need to check to see if they use setgroups or
> init_groups, and if they don't they can be fixed by simply
> disabling of setgroups before they run.

"disabling setgroups before writing to gid_map"?

The code is:

Reviewed-by: Andy Lutomirski <[email protected]>

>
> Cc: [email protected]
> Signed-off-by: "Eric W. Biederman" <[email protected]>
> ---
> kernel/user_namespace.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 3d128f91ced3..459c7f647072 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -828,6 +828,11 @@ static bool new_idmap_permitted(const struct file *file,
> kuid_t uid = make_kuid(ns->parent, id);
> if (uid_eq(uid, cred->euid))
> return true;
> + } else if (cap_setid == CAP_SETGID) {
> + kgid_t gid = make_kgid(ns->parent, id);
> + if (!userns_setgroups_allowed(ns) &&
> + gid_eq(gid, cred->egid))
> + return true;
> }
> }
>
> --
> 1.9.1
>



--
Andy Lutomirski
AMA Capital Management, LLC

2014-12-08 22:27:54

by Richard Weinberger

[permalink] [raw]
Subject: Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished

Am 08.12.2014 um 23:25 schrieb Andy Lutomirski:
> On Mon, Dec 8, 2014 at 2:17 PM, Richard Weinberger <[email protected]> wrote:
>> Am 08.12.2014 um 23:07 schrieb Eric W. Biederman:
>>>
>>> setgroups is unique in not needing a valid mapping before it can be called,
>>> in the case of setgroups(0, NULL) which drops all supplemental groups.
>>>
>>> The design of the user namespace assumes that CAP_SETGID can not actually
>>> be used until a gid mapping is established. Therefore add a helper function
>>> to see if the user namespace gid mapping has been established and call
>>> that function in the setgroups permission check.
>>>
>>> This is part of the fix for CVE-2014-8989, being able to drop groups
>>> without privilege using user namespaces.
>>>
>>> Cc: [email protected]
>>> Signed-off-by: "Eric W. Biederman" <[email protected]>
>>> ---
>>> include/linux/user_namespace.h | 9 +++++++++
>>> kernel/groups.c | 7 ++++++-
>>> 2 files changed, 15 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>>> index e95372654f09..41cc26e5a350 100644
>>> --- a/include/linux/user_namespace.h
>>> +++ b/include/linux/user_namespace.h
>>> @@ -37,6 +37,15 @@ struct user_namespace {
>>>
>>> extern struct user_namespace init_user_ns;
>>>
>>> +static inline bool userns_gid_mappings_established(const struct user_namespace *ns)
>>> +{
>>> + bool established;
>>> + smp_mb__before_atomic();
>>> + established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0;
>>> + smp_mb__after_atomic();
>>> + return established;
>>> +}
>>> +
>>
>> Maybe this is a stupid question, but why do we need all this magic
>> around established = ... ?
>> The purpose of this code is to check whether ns->gid_map.nr_extents != 0
>> in a lock-free manner?
>>
>
> See my other comment -- the ordering will matter at the end of the series.

But ns->gid_map.nr_extents is not atomic, it is a plain u32.
This confuses me.

Thanks,
//richard

2014-12-08 22:33:26

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished

On Mon, Dec 8, 2014 at 2:26 PM, Eric W. Biederman <[email protected]> wrote:
> Andy Lutomirski <[email protected]> writes:
>
>> On Mon, Dec 8, 2014 at 2:07 PM, Eric W. Biederman <[email protected]> wrote:
>>>
>>> setgroups is unique in not needing a valid mapping before it can be called,
>>> in the case of setgroups(0, NULL) which drops all supplemental groups.
>>>
>>> The design of the user namespace assumes that CAP_SETGID can not actually
>>> be used until a gid mapping is established. Therefore add a helper function
>>> to see if the user namespace gid mapping has been established and call
>>> that function in the setgroups permission check.
>>>
>>> This is part of the fix for CVE-2014-8989, being able to drop groups
>>> without privilege using user namespaces.
>>>
>>> Cc: [email protected]
>>> Signed-off-by: "Eric W. Biederman" <[email protected]>
>>> ---
>>> include/linux/user_namespace.h | 9 +++++++++
>>> kernel/groups.c | 7 ++++++-
>>> 2 files changed, 15 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>>> index e95372654f09..41cc26e5a350 100644
>>> --- a/include/linux/user_namespace.h
>>> +++ b/include/linux/user_namespace.h
>>> @@ -37,6 +37,15 @@ struct user_namespace {
>>>
>>> extern struct user_namespace init_user_ns;
>>>
>>> +static inline bool userns_gid_mappings_established(const struct user_namespace *ns)
>>> +{
>>> + bool established;
>>> + smp_mb__before_atomic();
>>> + established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0;
>>> + smp_mb__after_atomic();
>>> + return established;
>>> +}
>>
>> I don't think this works on all platforms. ACCESS_ONCE is not atomic
>> in the smp_mb__before_atomic sense.
>
> Documentation/atomic_ops.txt documents ACCESS_ONCE as being equivalent
> to atomic_read() and atomic_set(). smp_mb__before_atomic and
> smp_mb__after_atomic() are Documented as working with atomic_read and
> atomic_set. Maybe it is a stretch to use them but it doesn't seem like
> much of a stretch.

I don't fully understand the design there. I think this is an attempt
to work around the fact that test_bit is fully atomic on x86 but not
elsewhere.

>
> Further at this point I don't know that any barriers are strictly
> needed, beyond the ACCESS_ONCE. However since x86 does all of the
> ordering in hardware that I need I am not going to find any bugs that
> don't require a barrier.
>
> All I really want is the same level of barriers I would get if I used a
> spin-lock protected data structure so I don't need to worry about
> crazy smp issues that happen when the hardware decides it is safe to
> reorder things.

Use smp_rmb(), I think. It'll be obviously correct, and the
performance impact really doesn't matter.

Also, on platforms where this stuff matters, the barrier in
smp_mb__whatever will be a full fence, whereas smp_rmb may be lighter
weight.

--Andy

>
> Eric
>
>
>>> +
>>> #ifdef CONFIG_USER_NS
>>>
>>> static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
>>> diff --git a/kernel/groups.c b/kernel/groups.c
>>> index 02d8a251c476..e0335e44f76a 100644
>>> --- a/kernel/groups.c
>>> +++ b/kernel/groups.c
>>> @@ -6,6 +6,7 @@
>>> #include <linux/slab.h>
>>> #include <linux/security.h>
>>> #include <linux/syscalls.h>
>>> +#include <linux/user_namespace.h>
>>> #include <asm/uaccess.h>
>>>
>>> /* init to 2 - one for init_task, one to ensure it is never freed */
>>> @@ -217,7 +218,11 @@ bool may_setgroups(void)
>>> {
>>> struct user_namespace *user_ns = current_user_ns();
>>>
>>> - return ns_capable(user_ns, CAP_SETGID);
>>> + /* It is not safe to use setgroups until a gid mapping in
>>> + * the user namespace has been established.
>>> + */
>>> + return userns_gid_mappings_established(user_ns) &&
>>> + ns_capable(user_ns, CAP_SETGID);
>>> }
>>>
>>> /*
>>> --
>>> 1.9.1
>>>
>>
>> --Andy



--
Andy Lutomirski
AMA Capital Management, LLC

2014-12-08 22:46:43

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis

Andy Lutomirski <[email protected]> writes:

> On Mon, Dec 8, 2014 at 2:11 PM, Eric W. Biederman <[email protected]> wrote:
>>
>> - Expose the knob to user space through a proc file /proc/<pid>/setgroups
>>
>> A value of 0 means the setgroups system call is disabled in the
>
> "deny"
>
>> current processes user namespace and can not be enabled in the
>> future in this user namespace.
>>
>> A value of 1 means the segtoups system call is enabled.
>>
>
> "allow"
>
>> - Descedent user namespaces inherit the value of setgroups from
>
> s/Descedent/Descendent/

Bah. I updated everything but the changelog comment.

>> --- a/kernel/groups.c
>> +++ b/kernel/groups.c
>> @@ -222,6 +222,7 @@ bool may_setgroups(void)
>> * the user namespace has been established.
>> */
>> return userns_gid_mappings_established(user_ns) &&
>> + userns_setgroups_allowed(user_ns) &&
>> ns_capable(user_ns, CAP_SETGID);
>> }
>
> Can you add a comment explaining the ordering? For example:

I need to think on what I can say to make it clear.
Perhaps: /* Careful the order of these checks is important. */

> We need to check for a gid mapping before checking setgroups_allowed
> because an unprivileged user can create a userns with setgroups
> allowed, then disallow setgroups and add a mapping. If we check in
> the opposite order, then we have a race: we could see that setgroups
> is allowed before the user clears the bit and then see that there is a
> gid mapping after the other thread is done.

Since these are independent atomic variables yes that ordering issue
seems to be the case.

For me it was the natural ordering of the checks so I didn't even bother
to think about what happens when you reorder them.

Eric

2014-12-08 22:48:16

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished

On Mon, Dec 8, 2014 at 2:39 PM, Eric W. Biederman <[email protected]> wrote:
> Richard Weinberger <[email protected]> writes:
>
>> Am 08.12.2014 um 23:25 schrieb Andy Lutomirski:
>>> On Mon, Dec 8, 2014 at 2:17 PM, Richard Weinberger <[email protected]> wrote:
>>>> Am 08.12.2014 um 23:07 schrieb Eric W. Biederman:
>>>>>
>>>>> setgroups is unique in not needing a valid mapping before it can be called,
>>>>> in the case of setgroups(0, NULL) which drops all supplemental groups.
>>>>>
>>>>> The design of the user namespace assumes that CAP_SETGID can not actually
>>>>> be used until a gid mapping is established. Therefore add a helper function
>>>>> to see if the user namespace gid mapping has been established and call
>>>>> that function in the setgroups permission check.
>>>>>
>>>>> This is part of the fix for CVE-2014-8989, being able to drop groups
>>>>> without privilege using user namespaces.
>>>>>
>>>>> Cc: [email protected]
>>>>> Signed-off-by: "Eric W. Biederman" <[email protected]>
>>>>> ---
>>>>> include/linux/user_namespace.h | 9 +++++++++
>>>>> kernel/groups.c | 7 ++++++-
>>>>> 2 files changed, 15 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>>>>> index e95372654f09..41cc26e5a350 100644
>>>>> --- a/include/linux/user_namespace.h
>>>>> +++ b/include/linux/user_namespace.h
>>>>> @@ -37,6 +37,15 @@ struct user_namespace {
>>>>>
>>>>> extern struct user_namespace init_user_ns;
>>>>>
>>>>> +static inline bool userns_gid_mappings_established(const struct user_namespace *ns)
>>>>> +{
>>>>> + bool established;
>>>>> + smp_mb__before_atomic();
>>>>> + established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0;
>>>>> + smp_mb__after_atomic();
>>>>> + return established;
>>>>> +}
>>>>> +
>>>>
>>>> Maybe this is a stupid question, but why do we need all this magic
>>>> around established = ... ?
>>>> The purpose of this code is to check whether ns->gid_map.nr_extents != 0
>>>> in a lock-free manner?
>>>>
>>>
>>> See my other comment -- the ordering will matter at the end of the series.
>>
>> But ns->gid_map.nr_extents is not atomic, it is a plain u32.
>> This confuses me.
>
> Read Documentation/atomic_ops.txt a plain u32 is atomic by definiton.
>

I still don't understand why the helper changed to smp_mb__before_atomic.

> Which is a little bit convoluted. However that is part of the of the
> gid mapping path and I optimized that as far as I humanly could so that
> calls like stat don't take a noticable slow donw.
>
> On this path we don't particularly care except that I am using an the
> existing data structure.

As an example, arm64 defines both smp_mb__before_atomic and
smp_mb__after_atomic as smp_mb(), which is heavier then smp_rmb(), and
there are two of them. So I still like the explicit smp_rmb() better.

--Andy

>
> Eric
>



--
Andy Lutomirski
AMA Capital Management, LLC

2014-12-08 22:49:20

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis

On Mon, Dec 8, 2014 at 2:44 PM, Eric W. Biederman <[email protected]> wrote:
> Andy Lutomirski <[email protected]> writes:
>
>> On Mon, Dec 8, 2014 at 2:11 PM, Eric W. Biederman <[email protected]> wrote:
>>>
>>> - Expose the knob to user space through a proc file /proc/<pid>/setgroups
>>>
>>> A value of 0 means the setgroups system call is disabled in the
>>
>> "deny"
>>
>>> current processes user namespace and can not be enabled in the
>>> future in this user namespace.
>>>
>>> A value of 1 means the segtoups system call is enabled.
>>>
>>
>> "allow"
>>
>>> - Descedent user namespaces inherit the value of setgroups from
>>
>> s/Descedent/Descendent/
>
> Bah. I updated everything but the changelog comment.
>
>>> --- a/kernel/groups.c
>>> +++ b/kernel/groups.c
>>> @@ -222,6 +222,7 @@ bool may_setgroups(void)
>>> * the user namespace has been established.
>>> */
>>> return userns_gid_mappings_established(user_ns) &&
>>> + userns_setgroups_allowed(user_ns) &&
>>> ns_capable(user_ns, CAP_SETGID);
>>> }
>>
>> Can you add a comment explaining the ordering? For example:
>
> I need to think on what I can say to make it clear.
> Perhaps: /* Careful the order of these checks is important. */
>
>> We need to check for a gid mapping before checking setgroups_allowed
>> because an unprivileged user can create a userns with setgroups
>> allowed, then disallow setgroups and add a mapping. If we check in
>> the opposite order, then we have a race: we could see that setgroups
>> is allowed before the user clears the bit and then see that there is a
>> gid mapping after the other thread is done.
>

This text was actually my suggested comment text.

If you put smp_rmb() in this function with a comment like that, then I
think it will all make sense and be obviously correct (even with most
of the other barriers removed).

--Andy

> Since these are independent atomic variables yes that ordering issue
> seems to be the case.
>
> For me it was the natural ordering of the checks so I didn't even bother
> to think about what happens when you reorder them.
>
> Eric



--
Andy Lutomirski
AMA Capital Management, LLC

2014-12-08 23:32:27

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis

Andy Lutomirski <[email protected]> writes:

> On Mon, Dec 8, 2014 at 2:44 PM, Eric W. Biederman <[email protected]> wrote:
>> Andy Lutomirski <[email protected]> writes:
>>
>>> On Mon, Dec 8, 2014 at 2:11 PM, Eric W. Biederman <[email protected]> wrote:
>>>>
>>>> - Expose the knob to user space through a proc file /proc/<pid>/setgroups
>>>>
>>>> A value of 0 means the setgroups system call is disabled in the
>>>
>>> "deny"
>>>
>>>> current processes user namespace and can not be enabled in the
>>>> future in this user namespace.
>>>>
>>>> A value of 1 means the segtoups system call is enabled.
>>>>
>>>
>>> "allow"
>>>
>>>> - Descedent user namespaces inherit the value of setgroups from
>>>
>>> s/Descedent/Descendent/
>>
>> Bah. I updated everything but the changelog comment.
>>
>>>> --- a/kernel/groups.c
>>>> +++ b/kernel/groups.c
>>>> @@ -222,6 +222,7 @@ bool may_setgroups(void)
>>>> * the user namespace has been established.
>>>> */
>>>> return userns_gid_mappings_established(user_ns) &&
>>>> + userns_setgroups_allowed(user_ns) &&
>>>> ns_capable(user_ns, CAP_SETGID);
>>>> }
>>>
>>> Can you add a comment explaining the ordering? For example:
>>
>> I need to think on what I can say to make it clear.
>> Perhaps: /* Careful the order of these checks is important. */
>>
>>> We need to check for a gid mapping before checking setgroups_allowed
>>> because an unprivileged user can create a userns with setgroups
>>> allowed, then disallow setgroups and add a mapping. If we check in
>>> the opposite order, then we have a race: we could see that setgroups
>>> is allowed before the user clears the bit and then see that there is a
>>> gid mapping after the other thread is done.
>>
>
> This text was actually my suggested comment text.

Now I see.

> If you put smp_rmb() in this function with a comment like that, then I
> think it will all make sense and be obviously correct (even with most
> of the other barriers removed).

Right.

Given that we have to be careful when using these things anyway what
I was hoping to achieve with the barriers appears impossible, and
confusing so I will see about just adding barriers where we need them
for real. Sigh.

Eric

2014-12-09 19:34:10

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis

[email protected] (Eric W. Biederman) writes:

> Andy Lutomirski <[email protected]> writes:
>
>>
>> This text was actually my suggested comment text.
>
> Now I see.
>
>> If you put smp_rmb() in this function with a comment like that, then I
>> think it will all make sense and be obviously correct (even with most
>> of the other barriers removed).
>
> Right.
>
> Given that we have to be careful when using these things anyway what
> I was hoping to achieve with the barriers appears impossible, and
> confusing so I will see about just adding barriers where we need them
> for real. Sigh.

Doh. The code has been entirely too clever.

There are no need for atomics or other cleverness, I just need to
generalize id_map_mutex. I knew that had to be a trivially correct
way of handling this mess.

Eric

2014-12-09 20:39:10

by Eric W. Biederman

[permalink] [raw]
Subject: [CFT][PATCH 1/8] userns: Document what the invariant required for safe unprivileged mappings.


The rule is simple. Don't allow anything that wouldn't be allowed
without unprivileged mappings.

It was previously overlooked that establishing gid mappings would
allow dropping groups and potentially gaining permission to files and
directories that had lesser permissions for a specific group than for
all other users.

This is the rule needed to fix CVE-2014-8989 and prevent any other
security issues with new_idmap_permitted.

The reason for this rule is that the unix permission model is old and
there are programs out there somewhere that take advantage of every
little corner of it. So allowing a uid or gid mapping to be
established without privielge that would allow anything that would not
be allowed without that mapping will result in expectations from some
code somewhere being violated. Violated expectations about the
behavior of the OS is a long way to say a security issue.

Cc: [email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/user_namespace.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index aa312b0dc3ec..b99c862a2e3f 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -812,7 +812,9 @@ static bool new_idmap_permitted(const struct file *file,
struct user_namespace *ns, int cap_setid,
struct uid_gid_map *new_map)
{
- /* Allow mapping to your own filesystem ids */
+ /* Don't allow mappings that would allow anything that wouldn't
+ * be allowed without the establishment of unprivileged mappings.
+ */
if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
u32 id = new_map->extent[0].lower_first;
if (cap_setid == CAP_SETUID) {
--
1.9.1

2014-12-09 20:40:44

by Eric W. Biederman

[permalink] [raw]
Subject: [CFT][PATCH 2/8] userns: Don't allow setgroups until a gid mapping has been setablished


setgroups is unique in not needing a valid mapping before it can be called,
in the case of setgroups(0, NULL) which drops all supplemental groups.

The design of the user namespace assumes that CAP_SETGID can not actually
be used until a gid mapping is established. Therefore add a helper function
to see if the user namespace gid mapping has been established and call
that function in the setgroups permission check.

This is part of the fix for CVE-2014-8989, being able to drop groups
without privilege using user namespaces.

Cc: [email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
include/linux/user_namespace.h | 5 +++++
kernel/groups.c | 4 +++-
kernel/user_namespace.c | 14 ++++++++++++++
3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index e95372654f09..8d493083486a 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -63,6 +63,7 @@ extern const struct seq_operations proc_projid_seq_operations;
extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *);
extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *);
extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *);
+extern bool userns_may_setgroups(const struct user_namespace *ns);
#else

static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
@@ -87,6 +88,10 @@ static inline void put_user_ns(struct user_namespace *ns)
{
}

+static inline bool userns_may_setgroups(const struct user_namespace *ns)
+{
+ return true;
+}
#endif

#endif /* _LINUX_USER_H */
diff --git a/kernel/groups.c b/kernel/groups.c
index 02d8a251c476..664411f171b5 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -6,6 +6,7 @@
#include <linux/slab.h>
#include <linux/security.h>
#include <linux/syscalls.h>
+#include <linux/user_namespace.h>
#include <asm/uaccess.h>

/* init to 2 - one for init_task, one to ensure it is never freed */
@@ -217,7 +218,8 @@ bool may_setgroups(void)
{
struct user_namespace *user_ns = current_user_ns();

- return ns_capable(user_ns, CAP_SETGID);
+ return ns_capable(user_ns, CAP_SETGID) &&
+ userns_may_setgroups(user_ns);
}

/*
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index b99c862a2e3f..27c8dab48c07 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -843,6 +843,20 @@ static bool new_idmap_permitted(const struct file *file,
return false;
}

+bool userns_may_setgroups(const struct user_namespace *ns)
+{
+ bool allowed;
+
+ mutex_lock(&id_map_mutex);
+ /* It is not safe to use setgroups until a gid mapping in
+ * the user namespace has been established.
+ */
+ allowed = ns->gid_map.nr_extents != 0;
+ mutex_unlock(&id_map_mutex);
+
+ return allowed;
+}
+
static void *userns_get(struct task_struct *task)
{
struct user_namespace *user_ns;
--
1.9.1

2014-12-09 20:41:47

by Eric W. Biederman

[permalink] [raw]
Subject: [CFT][PATCH 3/8] userns: Don't allow unprivileged creation of gid mappings


As any gid mapping will allow and must allow for backwards
compatibility dropping groups don't allow any gid mappings to be
established without CAP_SETGID in the parent user namespace.

For a small class of applications this change breaks userspace
and removes useful functionality. This small class of applications
includes tools/testing/selftests/mount/unprivilged-remount-test.c

Most of the removed functionality will be added back with the addition
of a one way knob to disable setgroups. Once setgroups is disabled
setting the gid_map becomes as safe as setting the uid_map.

For more common applications that set the uid_map and the gid_map
with privilege this change will have no affect.

This is part of a fix for CVE-2014-8989.

Cc: [email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/user_namespace.c | 4 ----
1 file changed, 4 deletions(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 27c8dab48c07..1ce6d67c07b7 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -821,10 +821,6 @@ static bool new_idmap_permitted(const struct file *file,
kuid_t uid = make_kuid(ns->parent, id);
if (uid_eq(uid, file->f_cred->fsuid))
return true;
- } else if (cap_setid == CAP_SETGID) {
- kgid_t gid = make_kgid(ns->parent, id);
- if (gid_eq(gid, file->f_cred->fsgid))
- return true;
}
}

--
1.9.1

2014-12-09 20:42:21

by Eric W. Biederman

[permalink] [raw]
Subject: [CFT][PATCH 4/8] userns: Check euid no fsuid when establishing an unprivileged uid mapping


setresuid allows the euid to be set to any of uid, euid, suid, and
fsuid. Therefor it is safe to allow an unprivileged user to map
their euid and use CAP_SETUID privileged with exactly that uid,
as no new credentials can be obtained.

I can not find a combination of existing system calls that allows setting
uid, euid, suid, and fsuid from the fsuid making the previous use
of fsuid for allowing unprivileged mappings a bug.

This is part of a fix for CVE-2014-8989.

Cc: [email protected]
Reviewed-by: Andy Lutomirski <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/user_namespace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 1ce6d67c07b7..9451b12a9b6c 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -819,7 +819,7 @@ static bool new_idmap_permitted(const struct file *file,
u32 id = new_map->extent[0].lower_first;
if (cap_setid == CAP_SETUID) {
kuid_t uid = make_kuid(ns->parent, id);
- if (uid_eq(uid, file->f_cred->fsuid))
+ if (uid_eq(uid, file->f_cred->euid))
return true;
}
}
--
1.9.1

2014-12-09 20:43:43

by Eric W. Biederman

[permalink] [raw]
Subject: [CFT][PATCH 5/8] userns: Only allow the creator of the userns unprivileged mappings


If you did not create the user namespace and are allowed
to write to uid_map or gid_map you should already have the necessary
privilege in the parent user namespace to establish any mapping
you want so this will not affect userspace in practice.

Limiting unprivileged uid mapping establishment to the creator of the
user namespace makes it easier to verify all credentials obtained with
the uid mapping can be obtained without the uid mapping without
privilege.

Limiting unprivileged gid mapping establishment (which is temporarily
absent) to the creator of the user namespace also ensures that the
combination of uid and gid can already be obtained without privilege.

This is part of the fix for CVE-2014-8989.

Cc: [email protected]
Reviewed-by: Andy Lutomirski <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/user_namespace.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 9451b12a9b6c..1e34de2fbd60 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -812,14 +812,16 @@ static bool new_idmap_permitted(const struct file *file,
struct user_namespace *ns, int cap_setid,
struct uid_gid_map *new_map)
{
+ const struct cred *cred = file->f_cred;
/* Don't allow mappings that would allow anything that wouldn't
* be allowed without the establishment of unprivileged mappings.
*/
- if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
+ if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) &&
+ uid_eq(ns->owner, cred->euid)) {
u32 id = new_map->extent[0].lower_first;
if (cap_setid == CAP_SETUID) {
kuid_t uid = make_kuid(ns->parent, id);
- if (uid_eq(uid, file->f_cred->euid))
+ if (uid_eq(uid, cred->euid))
return true;
}
}
--
1.9.1

2014-12-09 20:44:20

by Eric W. Biederman

[permalink] [raw]
Subject: [CFT][PATCH 6/8] userns: Rename id_map_mutex to userns_state_mutex


Generalize id_map_mutex so it can be used for more state of a user namespace.

Cc: [email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/user_namespace.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 1e34de2fbd60..44a555ac6104 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -24,6 +24,7 @@
#include <linux/fs_struct.h>

static struct kmem_cache *user_ns_cachep __read_mostly;
+static DEFINE_MUTEX(userns_state_mutex);

static bool new_idmap_permitted(const struct file *file,
struct user_namespace *ns, int cap_setid,
@@ -583,9 +584,6 @@ static bool mappings_overlap(struct uid_gid_map *new_map,
return false;
}

-
-static DEFINE_MUTEX(id_map_mutex);
-
static ssize_t map_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos,
int cap_setid,
@@ -602,7 +600,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
ssize_t ret = -EINVAL;

/*
- * The id_map_mutex serializes all writes to any given map.
+ * The userns_state_mutex serializes all writes to any given map.
*
* Any map is only ever written once.
*
@@ -620,7 +618,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
* order and smp_rmb() is guaranteed that we don't have crazy
* architectures returning stale data.
*/
- mutex_lock(&id_map_mutex);
+ mutex_lock(&userns_state_mutex);

ret = -EPERM;
/* Only allow one successful write to the map */
@@ -750,7 +748,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
*ppos = count;
ret = count;
out:
- mutex_unlock(&id_map_mutex);
+ mutex_unlock(&userns_state_mutex);
if (page)
free_page(page);
return ret;
@@ -845,12 +843,12 @@ bool userns_may_setgroups(const struct user_namespace *ns)
{
bool allowed;

- mutex_lock(&id_map_mutex);
+ mutex_lock(&userns_state_mutex);
/* It is not safe to use setgroups until a gid mapping in
* the user namespace has been established.
*/
allowed = ns->gid_map.nr_extents != 0;
- mutex_unlock(&id_map_mutex);
+ mutex_unlock(&userns_state_mutex);

return allowed;
}
--
1.9.1

2014-12-09 20:44:57

by Eric W. Biederman

[permalink] [raw]
Subject: [CFT][PATCH 7/8] userns: Add a knob to disable setgroups on a per user namespace basis


- Expose the knob to user space through a proc file /proc/<pid>/setgroups

A value of "deny" means the setgroups system call is disabled in the
current processes user namespace and can not be enabled in the
future in this user namespace.

A value of "allow" means the segtoups system call is enabled.

- Descendant user namespaces inherit the value of setgroups from
their parents.

- A proc file is used (instead of a sysctl) as sysctls
currently do not pass in a struct file so file_ns_capable
is unusable.

Cc: [email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
fs/proc/base.c | 31 +++++++++----
include/linux/user_namespace.h | 7 +++
kernel/user.c | 1 +
kernel/user_namespace.c | 103 +++++++++++++++++++++++++++++++++++++++++
4 files changed, 134 insertions(+), 8 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 772efa45a452..4ebed9f01d97 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2386,7 +2386,7 @@ static int proc_tgid_io_accounting(struct seq_file *m, struct pid_namespace *ns,
#endif /* CONFIG_TASK_IO_ACCOUNTING */

#ifdef CONFIG_USER_NS
-static int proc_id_map_open(struct inode *inode, struct file *file,
+static int proc_userns_open(struct inode *inode, struct file *file,
const struct seq_operations *seq_ops)
{
struct user_namespace *ns = NULL;
@@ -2418,7 +2418,7 @@ err:
return ret;
}

-static int proc_id_map_release(struct inode *inode, struct file *file)
+static int proc_userns_release(struct inode *inode, struct file *file)
{
struct seq_file *seq = file->private_data;
struct user_namespace *ns = seq->private;
@@ -2428,17 +2428,17 @@ static int proc_id_map_release(struct inode *inode, struct file *file)

static int proc_uid_map_open(struct inode *inode, struct file *file)
{
- return proc_id_map_open(inode, file, &proc_uid_seq_operations);
+ return proc_userns_open(inode, file, &proc_uid_seq_operations);
}

static int proc_gid_map_open(struct inode *inode, struct file *file)
{
- return proc_id_map_open(inode, file, &proc_gid_seq_operations);
+ return proc_userns_open(inode, file, &proc_gid_seq_operations);
}

static int proc_projid_map_open(struct inode *inode, struct file *file)
{
- return proc_id_map_open(inode, file, &proc_projid_seq_operations);
+ return proc_userns_open(inode, file, &proc_projid_seq_operations);
}

static const struct file_operations proc_uid_map_operations = {
@@ -2446,7 +2446,7 @@ static const struct file_operations proc_uid_map_operations = {
.write = proc_uid_map_write,
.read = seq_read,
.llseek = seq_lseek,
- .release = proc_id_map_release,
+ .release = proc_userns_release,
};

static const struct file_operations proc_gid_map_operations = {
@@ -2454,7 +2454,7 @@ static const struct file_operations proc_gid_map_operations = {
.write = proc_gid_map_write,
.read = seq_read,
.llseek = seq_lseek,
- .release = proc_id_map_release,
+ .release = proc_userns_release,
};

static const struct file_operations proc_projid_map_operations = {
@@ -2462,7 +2462,20 @@ static const struct file_operations proc_projid_map_operations = {
.write = proc_projid_map_write,
.read = seq_read,
.llseek = seq_lseek,
- .release = proc_id_map_release,
+ .release = proc_userns_release,
+};
+
+static int proc_setgroups_open(struct inode *inode, struct file *file)
+{
+ return proc_userns_open(inode, file, &proc_setgroups_seq_operations);
+}
+
+static const struct file_operations proc_setgroups_operations = {
+ .open = proc_setgroups_open,
+ .write = proc_setgroups_write,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = proc_userns_release,
};
#endif /* CONFIG_USER_NS */

@@ -2572,6 +2585,7 @@ static const struct pid_entry tgid_base_stuff[] = {
REG("uid_map", S_IRUGO|S_IWUSR, proc_uid_map_operations),
REG("gid_map", S_IRUGO|S_IWUSR, proc_gid_map_operations),
REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations),
+ REG("setgroups", S_IRUGO|S_IWUSR, proc_setgroups_operations),
#endif
#ifdef CONFIG_CHECKPOINT_RESTORE
REG("timers", S_IRUGO, proc_timers_operations),
@@ -2913,6 +2927,7 @@ static const struct pid_entry tid_base_stuff[] = {
REG("uid_map", S_IRUGO|S_IWUSR, proc_uid_map_operations),
REG("gid_map", S_IRUGO|S_IWUSR, proc_gid_map_operations),
REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations),
+ REG("setgroups", S_IRUGO|S_IWUSR, proc_setgroups_operations),
#endif
};

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 8d493083486a..feb0f4ec5573 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -17,6 +17,10 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
} extent[UID_GID_MAP_MAX_EXTENTS];
};

+#define USERNS_SETGROUPS_ALLOWED 1UL
+
+#define USERNS_INIT_FLAGS USERNS_SETGROUPS_ALLOWED
+
struct user_namespace {
struct uid_gid_map uid_map;
struct uid_gid_map gid_map;
@@ -27,6 +31,7 @@ struct user_namespace {
kuid_t owner;
kgid_t group;
unsigned int proc_inum;
+ unsigned long flags;

/* Register of per-UID persistent keyrings for this namespace */
#ifdef CONFIG_PERSISTENT_KEYRINGS
@@ -60,9 +65,11 @@ struct seq_operations;
extern const struct seq_operations proc_uid_seq_operations;
extern const struct seq_operations proc_gid_seq_operations;
extern const struct seq_operations proc_projid_seq_operations;
+extern const struct seq_operations proc_setgroups_seq_operations;
extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *);
extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *);
extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *);
+extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
extern bool userns_may_setgroups(const struct user_namespace *ns);
#else

diff --git a/kernel/user.c b/kernel/user.c
index 4efa39350e44..2d09940c9632 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -51,6 +51,7 @@ struct user_namespace init_user_ns = {
.owner = GLOBAL_ROOT_UID,
.group = GLOBAL_ROOT_GID,
.proc_inum = PROC_USER_INIT_INO,
+ .flags = USERNS_INIT_FLAGS,
#ifdef CONFIG_PERSISTENT_KEYRINGS
.persistent_keyring_register_sem =
__RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem),
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 44a555ac6104..b507f9af7ff2 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -100,6 +100,11 @@ int create_user_ns(struct cred *new)
ns->owner = owner;
ns->group = group;

+ /* Inherit USERNS_SETGROUPS_ALLOWED from our parent */
+ mutex_lock(&userns_state_mutex);
+ ns->flags = parent_ns->flags;
+ mutex_unlock(&userns_state_mutex);
+
set_cred_user_ns(new, ns);

#ifdef CONFIG_PERSISTENT_KEYRINGS
@@ -839,6 +844,102 @@ static bool new_idmap_permitted(const struct file *file,
return false;
}

+static void *setgroups_m_start(struct seq_file *seq, loff_t *ppos)
+{
+ struct user_namespace *ns = seq->private;
+
+ return (*ppos == 0) ? ns : NULL;
+}
+
+static void *setgroups_m_next(struct seq_file *seq, void *v, loff_t *ppos)
+{
+ ++*ppos;
+ return NULL;
+}
+
+static void setgroups_m_stop(struct seq_file *seq, void *v)
+{
+}
+
+static int setgroups_m_show(struct seq_file *seq, void *v)
+{
+ struct user_namespace *ns = seq->private;
+
+ seq_printf(seq, "%s\n",
+ test_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags) ?
+ "allow" : "deny");
+ return 0;
+}
+
+const struct seq_operations proc_setgroups_seq_operations = {
+ .start = setgroups_m_start,
+ .stop = setgroups_m_stop,
+ .next = setgroups_m_next,
+ .show = setgroups_m_show,
+};
+
+ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct seq_file *seq = file->private_data;
+ struct user_namespace *ns = seq->private;
+ char kbuf[8], *pos;
+ bool setgroups_allowed;
+ ssize_t ret;
+
+ ret = -EACCES;
+ if (!file_ns_capable(file, ns, CAP_SYS_ADMIN))
+ goto out;
+
+ /* Only allow a very narrow range of strings to be written */
+ ret = -EINVAL;
+ if ((*ppos != 0) || (count >= sizeof(kbuf)))
+ goto out;
+
+ /* What was written? */
+ ret = -EFAULT;
+ if (copy_from_user(kbuf, buf, count))
+ goto out;
+ kbuf[count] = '\0';
+ pos = kbuf;
+
+ /* What is being requested? */
+ ret = -EINVAL;
+ if (strncmp(pos, "allow", 5) == 0) {
+ pos += 5;
+ setgroups_allowed = true;
+ }
+ else if (strncmp(pos, "deny", 4) == 0) {
+ pos += 4;
+ setgroups_allowed = false;
+ }
+ else
+ goto out;
+
+ /* Verify there is not trailing junk on the line */
+ pos = skip_spaces(pos);
+ if (*pos != '\0')
+ goto out;
+
+ mutex_lock(&userns_state_mutex);
+ if (setgroups_allowed) {
+ ret = -EPERM;
+ if (!(ns->flags & USERNS_SETGROUPS_ALLOWED)) {
+ mutex_unlock(&userns_state_mutex);
+ goto out;
+ }
+ } else {
+ ns->flags &= ~USERNS_SETGROUPS_ALLOWED;
+ }
+ mutex_unlock(&userns_state_mutex);
+
+ /* Report a successful write */
+ *ppos = count;
+ ret = count;
+out:
+ return ret;
+}
+
bool userns_may_setgroups(const struct user_namespace *ns)
{
bool allowed;
@@ -848,6 +949,8 @@ bool userns_may_setgroups(const struct user_namespace *ns)
* the user namespace has been established.
*/
allowed = ns->gid_map.nr_extents != 0;
+ /* Is setgroups allowed? */
+ allowed = allowed && (ns->flags & USERNS_SETGROUPS_ALLOWED);
mutex_unlock(&userns_state_mutex);

return allowed;
--
1.9.1

2014-12-09 20:45:38

by Eric W. Biederman

[permalink] [raw]
Subject: [CFT][PATCH 8/8] userns: Allow setting gid_maps without privilege when setgroups is disabled


Now that setgroups can be disabled and not reenabled, setting gid_map
without privielge can now be enabled when setgroups is disabled.

This restores most of the functionality that was lost when unprivileged
setting of gid_map was removed. Applications that use this functionality
will need to check to see if they use setgroups or init_groups, and if they
don't they can be fixed by simply disabling setgroups before writing to
gid_map.

Cc: [email protected]
Reviewed-by: Andy Lutomirski <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/user_namespace.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index b507f9af7ff2..3b29b9a52332 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -826,6 +826,11 @@ static bool new_idmap_permitted(const struct file *file,
kuid_t uid = make_kuid(ns->parent, id);
if (uid_eq(uid, cred->euid))
return true;
+ } else if (cap_setid == CAP_SETGID) {
+ kgid_t gid = make_kgid(ns->parent, id);
+ if (!(ns->flags & USERNS_SETGROUPS_ALLOWED) &&
+ gid_eq(gid, cred->egid))
+ return true;
}
}

--
1.9.1

2014-12-09 22:29:04

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [CFT][PATCH 7/8] userns: Add a knob to disable setgroups on a per user namespace basis

On Tue, Dec 9, 2014 at 12:42 PM, Eric W. Biederman
<[email protected]> wrote:
>
> - Expose the knob to user space through a proc file /proc/<pid>/setgroups
>
> A value of "deny" means the setgroups system call is disabled in the
> current processes user namespace and can not be enabled in the
> future in this user namespace.
>
> A value of "allow" means the segtoups system call is enabled.
>
> - Descendant user namespaces inherit the value of setgroups from
> their parents.
>
> - A proc file is used (instead of a sysctl) as sysctls
> currently do not pass in a struct file so file_ns_capable
> is unusable.

Reviewed-by: Andy Lutomirski <[email protected]>

But I still don't like the name "setgroups". People may look at that
and have no clue what the scope of the setting is. And anyone who, as
root, writes "deny" to /proc/self/setgroups, thinking that it acts on
self, will be in for a surprise.

--Andy

>
> Cc: [email protected]
> Signed-off-by: "Eric W. Biederman" <[email protected]>
> ---
> fs/proc/base.c | 31 +++++++++----
> include/linux/user_namespace.h | 7 +++
> kernel/user.c | 1 +
> kernel/user_namespace.c | 103 +++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 134 insertions(+), 8 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 772efa45a452..4ebed9f01d97 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -2386,7 +2386,7 @@ static int proc_tgid_io_accounting(struct seq_file *m, struct pid_namespace *ns,
> #endif /* CONFIG_TASK_IO_ACCOUNTING */
>
> #ifdef CONFIG_USER_NS
> -static int proc_id_map_open(struct inode *inode, struct file *file,
> +static int proc_userns_open(struct inode *inode, struct file *file,
> const struct seq_operations *seq_ops)
> {
> struct user_namespace *ns = NULL;
> @@ -2418,7 +2418,7 @@ err:
> return ret;
> }
>
> -static int proc_id_map_release(struct inode *inode, struct file *file)
> +static int proc_userns_release(struct inode *inode, struct file *file)
> {
> struct seq_file *seq = file->private_data;
> struct user_namespace *ns = seq->private;
> @@ -2428,17 +2428,17 @@ static int proc_id_map_release(struct inode *inode, struct file *file)
>
> static int proc_uid_map_open(struct inode *inode, struct file *file)
> {
> - return proc_id_map_open(inode, file, &proc_uid_seq_operations);
> + return proc_userns_open(inode, file, &proc_uid_seq_operations);
> }
>
> static int proc_gid_map_open(struct inode *inode, struct file *file)
> {
> - return proc_id_map_open(inode, file, &proc_gid_seq_operations);
> + return proc_userns_open(inode, file, &proc_gid_seq_operations);
> }
>
> static int proc_projid_map_open(struct inode *inode, struct file *file)
> {
> - return proc_id_map_open(inode, file, &proc_projid_seq_operations);
> + return proc_userns_open(inode, file, &proc_projid_seq_operations);
> }
>
> static const struct file_operations proc_uid_map_operations = {
> @@ -2446,7 +2446,7 @@ static const struct file_operations proc_uid_map_operations = {
> .write = proc_uid_map_write,
> .read = seq_read,
> .llseek = seq_lseek,
> - .release = proc_id_map_release,
> + .release = proc_userns_release,
> };
>
> static const struct file_operations proc_gid_map_operations = {
> @@ -2454,7 +2454,7 @@ static const struct file_operations proc_gid_map_operations = {
> .write = proc_gid_map_write,
> .read = seq_read,
> .llseek = seq_lseek,
> - .release = proc_id_map_release,
> + .release = proc_userns_release,
> };
>
> static const struct file_operations proc_projid_map_operations = {
> @@ -2462,7 +2462,20 @@ static const struct file_operations proc_projid_map_operations = {
> .write = proc_projid_map_write,
> .read = seq_read,
> .llseek = seq_lseek,
> - .release = proc_id_map_release,
> + .release = proc_userns_release,
> +};
> +
> +static int proc_setgroups_open(struct inode *inode, struct file *file)
> +{
> + return proc_userns_open(inode, file, &proc_setgroups_seq_operations);
> +}
> +
> +static const struct file_operations proc_setgroups_operations = {
> + .open = proc_setgroups_open,
> + .write = proc_setgroups_write,
> + .read = seq_read,
> + .llseek = seq_lseek,
> + .release = proc_userns_release,
> };
> #endif /* CONFIG_USER_NS */
>
> @@ -2572,6 +2585,7 @@ static const struct pid_entry tgid_base_stuff[] = {
> REG("uid_map", S_IRUGO|S_IWUSR, proc_uid_map_operations),
> REG("gid_map", S_IRUGO|S_IWUSR, proc_gid_map_operations),
> REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations),
> + REG("setgroups", S_IRUGO|S_IWUSR, proc_setgroups_operations),
> #endif
> #ifdef CONFIG_CHECKPOINT_RESTORE
> REG("timers", S_IRUGO, proc_timers_operations),
> @@ -2913,6 +2927,7 @@ static const struct pid_entry tid_base_stuff[] = {
> REG("uid_map", S_IRUGO|S_IWUSR, proc_uid_map_operations),
> REG("gid_map", S_IRUGO|S_IWUSR, proc_gid_map_operations),
> REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations),
> + REG("setgroups", S_IRUGO|S_IWUSR, proc_setgroups_operations),
> #endif
> };
>
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index 8d493083486a..feb0f4ec5573 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -17,6 +17,10 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
> } extent[UID_GID_MAP_MAX_EXTENTS];
> };
>
> +#define USERNS_SETGROUPS_ALLOWED 1UL
> +
> +#define USERNS_INIT_FLAGS USERNS_SETGROUPS_ALLOWED
> +
> struct user_namespace {
> struct uid_gid_map uid_map;
> struct uid_gid_map gid_map;
> @@ -27,6 +31,7 @@ struct user_namespace {
> kuid_t owner;
> kgid_t group;
> unsigned int proc_inum;
> + unsigned long flags;
>
> /* Register of per-UID persistent keyrings for this namespace */
> #ifdef CONFIG_PERSISTENT_KEYRINGS
> @@ -60,9 +65,11 @@ struct seq_operations;
> extern const struct seq_operations proc_uid_seq_operations;
> extern const struct seq_operations proc_gid_seq_operations;
> extern const struct seq_operations proc_projid_seq_operations;
> +extern const struct seq_operations proc_setgroups_seq_operations;
> extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *);
> extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *);
> extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *);
> +extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
> extern bool userns_may_setgroups(const struct user_namespace *ns);
> #else
>
> diff --git a/kernel/user.c b/kernel/user.c
> index 4efa39350e44..2d09940c9632 100644
> --- a/kernel/user.c
> +++ b/kernel/user.c
> @@ -51,6 +51,7 @@ struct user_namespace init_user_ns = {
> .owner = GLOBAL_ROOT_UID,
> .group = GLOBAL_ROOT_GID,
> .proc_inum = PROC_USER_INIT_INO,
> + .flags = USERNS_INIT_FLAGS,
> #ifdef CONFIG_PERSISTENT_KEYRINGS
> .persistent_keyring_register_sem =
> __RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem),
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 44a555ac6104..b507f9af7ff2 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -100,6 +100,11 @@ int create_user_ns(struct cred *new)
> ns->owner = owner;
> ns->group = group;
>
> + /* Inherit USERNS_SETGROUPS_ALLOWED from our parent */
> + mutex_lock(&userns_state_mutex);
> + ns->flags = parent_ns->flags;
> + mutex_unlock(&userns_state_mutex);
> +
> set_cred_user_ns(new, ns);
>
> #ifdef CONFIG_PERSISTENT_KEYRINGS
> @@ -839,6 +844,102 @@ static bool new_idmap_permitted(const struct file *file,
> return false;
> }
>
> +static void *setgroups_m_start(struct seq_file *seq, loff_t *ppos)
> +{
> + struct user_namespace *ns = seq->private;
> +
> + return (*ppos == 0) ? ns : NULL;
> +}
> +
> +static void *setgroups_m_next(struct seq_file *seq, void *v, loff_t *ppos)
> +{
> + ++*ppos;
> + return NULL;
> +}
> +
> +static void setgroups_m_stop(struct seq_file *seq, void *v)
> +{
> +}
> +
> +static int setgroups_m_show(struct seq_file *seq, void *v)
> +{
> + struct user_namespace *ns = seq->private;
> +
> + seq_printf(seq, "%s\n",
> + test_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags) ?
> + "allow" : "deny");
> + return 0;
> +}
> +
> +const struct seq_operations proc_setgroups_seq_operations = {
> + .start = setgroups_m_start,
> + .stop = setgroups_m_stop,
> + .next = setgroups_m_next,
> + .show = setgroups_m_show,
> +};
> +
> +ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
> + size_t count, loff_t *ppos)
> +{
> + struct seq_file *seq = file->private_data;
> + struct user_namespace *ns = seq->private;
> + char kbuf[8], *pos;
> + bool setgroups_allowed;
> + ssize_t ret;
> +
> + ret = -EACCES;
> + if (!file_ns_capable(file, ns, CAP_SYS_ADMIN))
> + goto out;
> +
> + /* Only allow a very narrow range of strings to be written */
> + ret = -EINVAL;
> + if ((*ppos != 0) || (count >= sizeof(kbuf)))
> + goto out;
> +
> + /* What was written? */
> + ret = -EFAULT;
> + if (copy_from_user(kbuf, buf, count))
> + goto out;
> + kbuf[count] = '\0';
> + pos = kbuf;
> +
> + /* What is being requested? */
> + ret = -EINVAL;
> + if (strncmp(pos, "allow", 5) == 0) {
> + pos += 5;
> + setgroups_allowed = true;
> + }
> + else if (strncmp(pos, "deny", 4) == 0) {
> + pos += 4;
> + setgroups_allowed = false;
> + }
> + else
> + goto out;
> +
> + /* Verify there is not trailing junk on the line */
> + pos = skip_spaces(pos);
> + if (*pos != '\0')
> + goto out;
> +
> + mutex_lock(&userns_state_mutex);
> + if (setgroups_allowed) {
> + ret = -EPERM;
> + if (!(ns->flags & USERNS_SETGROUPS_ALLOWED)) {
> + mutex_unlock(&userns_state_mutex);
> + goto out;
> + }
> + } else {
> + ns->flags &= ~USERNS_SETGROUPS_ALLOWED;
> + }
> + mutex_unlock(&userns_state_mutex);
> +
> + /* Report a successful write */
> + *ppos = count;
> + ret = count;
> +out:
> + return ret;
> +}
> +
> bool userns_may_setgroups(const struct user_namespace *ns)
> {
> bool allowed;
> @@ -848,6 +949,8 @@ bool userns_may_setgroups(const struct user_namespace *ns)
> * the user namespace has been established.
> */
> allowed = ns->gid_map.nr_extents != 0;
> + /* Is setgroups allowed? */
> + allowed = allowed && (ns->flags & USERNS_SETGROUPS_ALLOWED);
> mutex_unlock(&userns_state_mutex);
>
> return allowed;
> --
> 1.9.1
>



--
Andy Lutomirski
AMA Capital Management, LLC

2014-12-09 22:49:29

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [CFT][PATCH 2/8] userns: Don't allow setgroups until a gid mapping has been setablished

On Tue, Dec 9, 2014 at 12:38 PM, Eric W. Biederman
<[email protected]> wrote:
>
> setgroups is unique in not needing a valid mapping before it can be called,
> in the case of setgroups(0, NULL) which drops all supplemental groups.
>
> The design of the user namespace assumes that CAP_SETGID can not actually
> be used until a gid mapping is established. Therefore add a helper function
> to see if the user namespace gid mapping has been established and call
> that function in the setgroups permission check.
>
> This is part of the fix for CVE-2014-8989, being able to drop groups
> without privilege using user namespaces.

Reviewed-by: Andy Lutomirski <[email protected]>

>
> Cc: [email protected]
> Signed-off-by: "Eric W. Biederman" <[email protected]>
> ---
> include/linux/user_namespace.h | 5 +++++
> kernel/groups.c | 4 +++-
> kernel/user_namespace.c | 14 ++++++++++++++
> 3 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index e95372654f09..8d493083486a 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -63,6 +63,7 @@ extern const struct seq_operations proc_projid_seq_operations;
> extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *);
> extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *);
> extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *);
> +extern bool userns_may_setgroups(const struct user_namespace *ns);
> #else
>
> static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
> @@ -87,6 +88,10 @@ static inline void put_user_ns(struct user_namespace *ns)
> {
> }
>
> +static inline bool userns_may_setgroups(const struct user_namespace *ns)
> +{
> + return true;
> +}
> #endif
>
> #endif /* _LINUX_USER_H */
> diff --git a/kernel/groups.c b/kernel/groups.c
> index 02d8a251c476..664411f171b5 100644
> --- a/kernel/groups.c
> +++ b/kernel/groups.c
> @@ -6,6 +6,7 @@
> #include <linux/slab.h>
> #include <linux/security.h>
> #include <linux/syscalls.h>
> +#include <linux/user_namespace.h>
> #include <asm/uaccess.h>
>
> /* init to 2 - one for init_task, one to ensure it is never freed */
> @@ -217,7 +218,8 @@ bool may_setgroups(void)
> {
> struct user_namespace *user_ns = current_user_ns();
>
> - return ns_capable(user_ns, CAP_SETGID);
> + return ns_capable(user_ns, CAP_SETGID) &&
> + userns_may_setgroups(user_ns);
> }
>
> /*
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index b99c862a2e3f..27c8dab48c07 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -843,6 +843,20 @@ static bool new_idmap_permitted(const struct file *file,
> return false;
> }
>
> +bool userns_may_setgroups(const struct user_namespace *ns)
> +{
> + bool allowed;
> +
> + mutex_lock(&id_map_mutex);
> + /* It is not safe to use setgroups until a gid mapping in
> + * the user namespace has been established.
> + */
> + allowed = ns->gid_map.nr_extents != 0;
> + mutex_unlock(&id_map_mutex);
> +
> + return allowed;
> +}
> +
> static void *userns_get(struct task_struct *task)
> {
> struct user_namespace *user_ns;
> --
> 1.9.1
>



--
Andy Lutomirski
AMA Capital Management, LLC

2014-12-09 22:50:01

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [CFT][PATCH 6/8] userns: Rename id_map_mutex to userns_state_mutex

On Tue, Dec 9, 2014 at 12:41 PM, Eric W. Biederman
<[email protected]> wrote:
>
> Generalize id_map_mutex so it can be used for more state of a user namespace.

Reviewed-by: Andy Lutomirski <[email protected]>

>
> Cc: [email protected]
> Signed-off-by: "Eric W. Biederman" <[email protected]>
> ---
> kernel/user_namespace.c | 14 ++++++--------
> 1 file changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 1e34de2fbd60..44a555ac6104 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -24,6 +24,7 @@
> #include <linux/fs_struct.h>
>
> static struct kmem_cache *user_ns_cachep __read_mostly;
> +static DEFINE_MUTEX(userns_state_mutex);
>
> static bool new_idmap_permitted(const struct file *file,
> struct user_namespace *ns, int cap_setid,
> @@ -583,9 +584,6 @@ static bool mappings_overlap(struct uid_gid_map *new_map,
> return false;
> }
>
> -
> -static DEFINE_MUTEX(id_map_mutex);
> -
> static ssize_t map_write(struct file *file, const char __user *buf,
> size_t count, loff_t *ppos,
> int cap_setid,
> @@ -602,7 +600,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
> ssize_t ret = -EINVAL;
>
> /*
> - * The id_map_mutex serializes all writes to any given map.
> + * The userns_state_mutex serializes all writes to any given map.
> *
> * Any map is only ever written once.
> *
> @@ -620,7 +618,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
> * order and smp_rmb() is guaranteed that we don't have crazy
> * architectures returning stale data.
> */
> - mutex_lock(&id_map_mutex);
> + mutex_lock(&userns_state_mutex);
>
> ret = -EPERM;
> /* Only allow one successful write to the map */
> @@ -750,7 +748,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
> *ppos = count;
> ret = count;
> out:
> - mutex_unlock(&id_map_mutex);
> + mutex_unlock(&userns_state_mutex);
> if (page)
> free_page(page);
> return ret;
> @@ -845,12 +843,12 @@ bool userns_may_setgroups(const struct user_namespace *ns)
> {
> bool allowed;
>
> - mutex_lock(&id_map_mutex);
> + mutex_lock(&userns_state_mutex);
> /* It is not safe to use setgroups until a gid mapping in
> * the user namespace has been established.
> */
> allowed = ns->gid_map.nr_extents != 0;
> - mutex_unlock(&id_map_mutex);
> + mutex_unlock(&userns_state_mutex);
>
> return allowed;
> }
> --
> 1.9.1
>



--
Andy Lutomirski
AMA Capital Management, LLC

2014-12-09 23:00:50

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [CFT][PATCH 3/8] userns: Don't allow unprivileged creation of gid mappings

On Tue, Dec 9, 2014 at 12:39 PM, Eric W. Biederman
<[email protected]> wrote:
>
> As any gid mapping will allow and must allow for backwards
> compatibility dropping groups don't allow any gid mappings to be
> established without CAP_SETGID in the parent user namespace.
>
> For a small class of applications this change breaks userspace
> and removes useful functionality. This small class of applications
> includes tools/testing/selftests/mount/unprivilged-remount-test.c
>
> Most of the removed functionality will be added back with the addition
> of a one way knob to disable setgroups. Once setgroups is disabled
> setting the gid_map becomes as safe as setting the uid_map.
>
> For more common applications that set the uid_map and the gid_map
> with privilege this change will have no affect.
>

Reviewed-by: Andy Lutomirski <[email protected]>

> This is part of a fix for CVE-2014-8989.
>
> Cc: [email protected]
> Signed-off-by: "Eric W. Biederman" <[email protected]>
> ---
> kernel/user_namespace.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 27c8dab48c07..1ce6d67c07b7 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -821,10 +821,6 @@ static bool new_idmap_permitted(const struct file *file,
> kuid_t uid = make_kuid(ns->parent, id);
> if (uid_eq(uid, file->f_cred->fsuid))
> return true;
> - } else if (cap_setid == CAP_SETGID) {
> - kgid_t gid = make_kgid(ns->parent, id);
> - if (gid_eq(gid, file->f_cred->fsgid))
> - return true;
> }
> }
>
> --
> 1.9.1
>



--
Andy Lutomirski
AMA Capital Management, LLC

2014-12-10 00:21:49

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [CFT][PATCH 7/8] userns: Add a knob to disable setgroups on a per user namespace basis

On Tue, Dec 9, 2014 at 4:04 PM, Eric W.Biederman <[email protected]> wrote:
>
>
> On December 9, 2014 4:28:38 PM CST, Andy Lutomirski <[email protected]> wrote:
>>On Tue, Dec 9, 2014 at 12:42 PM, Eric W. Biederman
>><[email protected]> wrote:
>>>
>>> - Expose the knob to user space through a proc file
>>/proc/<pid>/setgroups
>>>
>>> A value of "deny" means the setgroups system call is disabled in
>>the
>>> current processes user namespace and can not be enabled in the
>>> future in this user namespace.
>>>
>>> A value of "allow" means the segtoups system call is enabled.
>>>
>>> - Descendant user namespaces inherit the value of setgroups from
>>> their parents.
>>>
>>> - A proc file is used (instead of a sysctl) as sysctls
>>> currently do not pass in a struct file so file_ns_capable
>>> is unusable.
>>
>>Reviewed-by: Andy Lutomirski <[email protected]>
>>
>>But I still don't like the name "setgroups". People may look at that
>>and have no clue what the scope of the setting is. And anyone who, as
>>root, writes "deny" to /proc/self/setgroups, thinking that it acts on
>>self, will be in for a surprise.
>
> True setgroups isn't perfect. Documenting it in a manpage may have to be enough. The only real improvement I can think of would be to make the setting a sysctl. But I think pursuing that approaches the point where perfection is the enemy of getting this problem fixed.
>

Would "userns_setgroups" be okay?

--Andy

> Eric



--
Andy Lutomirski
AMA Capital Management, LLC

2014-12-10 16:42:22

by Eric W. Biederman

[permalink] [raw]
Subject: [CFT] Can I get some Tested-By's on this series?


Will people please test these patches with their container project?

These changes break container userspace (hopefully in a minimal way) if
I could have that confirmed by testing I would really appreciate it. I
really don't want to send out a bug fix that accidentally breaks
userspace again.

The only issue sort of under discussion is if there is a better name for
/proc/<pid>/setgroups, and the name of the file will not affect the
functionality of the patchset.

With the code reviewed and written in simple obviously correct, easily
reviewable ways I am hoping/planning to send this to Linus ASAP.

Eric

2014-12-10 22:48:34

by Serge Hallyn

[permalink] [raw]
Subject: Re: [CFT] Can I get some Tested-By's on this series?

Quoting Eric W. Biederman ([email protected]):
>
> Will people please test these patches with their container project?
>
> These changes break container userspace (hopefully in a minimal way) if
> I could have that confirmed by testing I would really appreciate it. I
> really don't want to send out a bug fix that accidentally breaks
> userspace again.
>
> The only issue sort of under discussion is if there is a better name for
> /proc/<pid>/setgroups, and the name of the file will not affect the
> functionality of the patchset.
>
> With the code reviewed and written in simple obviously correct, easily
> reviewable ways I am hoping/planning to send this to Linus ASAP.
>
> Eric

Is there a git tree we can clone?

2014-12-10 22:51:05

by Richard Weinberger

[permalink] [raw]
Subject: Re: [CFT] Can I get some Tested-By's on this series?

Am 10.12.2014 um 23:48 schrieb Serge Hallyn:
> Quoting Eric W. Biederman ([email protected]):
>>
>> Will people please test these patches with their container project?
>>
>> These changes break container userspace (hopefully in a minimal way) if
>> I could have that confirmed by testing I would really appreciate it. I
>> really don't want to send out a bug fix that accidentally breaks
>> userspace again.
>>
>> The only issue sort of under discussion is if there is a better name for
>> /proc/<pid>/setgroups, and the name of the file will not affect the
>> functionality of the patchset.
>>
>> With the code reviewed and written in simple obviously correct, easily
>> reviewable ways I am hoping/planning to send this to Linus ASAP.
>>
>> Eric
>
> Is there a git tree we can clone?

I was about to ask that too.
Hopefully I can tomorrow find some time for testing.

Thanks,
//richard

2014-12-10 23:22:17

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [CFT] Can I get some Tested-By's on this series?

Richard Weinberger <[email protected]> writes:

> Am 10.12.2014 um 23:48 schrieb Serge Hallyn:
>> Quoting Eric W. Biederman ([email protected]):
>>>
>>> Will people please test these patches with their container project?
>>>
>>> These changes break container userspace (hopefully in a minimal way) if
>>> I could have that confirmed by testing I would really appreciate it. I
>>> really don't want to send out a bug fix that accidentally breaks
>>> userspace again.
>>>
>>> The only issue sort of under discussion is if there is a better name for
>>> /proc/<pid>/setgroups, and the name of the file will not affect the
>>> functionality of the patchset.
>>>
>>> With the code reviewed and written in simple obviously correct, easily
>>> reviewable ways I am hoping/planning to send this to Linus ASAP.
>>>
>>> Eric
>>
>> Is there a git tree we can clone?
>
> I was about to ask that too.
> Hopefully I can tomorrow find some time for testing.

git pull git.kernel.org:/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing

That holds my entire queue of fixes against 3.18-rc6

Eric

2014-12-11 19:27:45

by Richard Weinberger

[permalink] [raw]
Subject: Re: [CFT] Can I get some Tested-By's on this series?

Am 11.12.2014 um 00:19 schrieb Eric W. Biederman:
> Richard Weinberger <[email protected]> writes:
>
>> Am 10.12.2014 um 23:48 schrieb Serge Hallyn:
>>> Quoting Eric W. Biederman ([email protected]):
>>>>
>>>> Will people please test these patches with their container project?
>>>>
>>>> These changes break container userspace (hopefully in a minimal way) if
>>>> I could have that confirmed by testing I would really appreciate it. I
>>>> really don't want to send out a bug fix that accidentally breaks
>>>> userspace again.
>>>>
>>>> The only issue sort of under discussion is if there is a better name for
>>>> /proc/<pid>/setgroups, and the name of the file will not affect the
>>>> functionality of the patchset.
>>>>
>>>> With the code reviewed and written in simple obviously correct, easily
>>>> reviewable ways I am hoping/planning to send this to Linus ASAP.
>>>>
>>>> Eric
>>>
>>> Is there a git tree we can clone?
>>
>> I was about to ask that too.
>> Hopefully I can tomorrow find some time for testing.
>
> git pull git.kernel.org:/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing
>
> That holds my entire queue of fixes against 3.18-rc6

So far nothing broke on my libvirt-lxc test bed. :-)
Tested with openSUSE 13.2 and libvirt 1.2.9.

Tested-by: Richard Weinberger <[email protected]>

Thanks,
//richard

2014-12-12 01:30:41

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [CFT][PATCH 7/8] userns: Add a knob to disable setgroups on a per user namespace basis

On Thu, Dec 11, 2014 at 5:09 PM, Eric W. Biederman
<[email protected]> wrote:
> [email protected] (Eric W. Biederman) writes:
>
>> Andy Lutomirski <[email protected]> writes:
>>
>>> On Tue, Dec 9, 2014 at 4:04 PM, Eric W.Biederman <[email protected]> wrote:
>>>>
>>>>
>>>> On December 9, 2014 4:28:38 PM CST, Andy Lutomirski <[email protected]> wrote:
>>>>>On Tue, Dec 9, 2014 at 12:42 PM, Eric W. Biederman
>>>>><[email protected]> wrote:
>>>>>>
>>>>>> - Expose the knob to user space through a proc file
>>>>>/proc/<pid>/setgroups
>>>>>>
>>>>>> A value of "deny" means the setgroups system call is disabled in
>>>>>the
>>>>>> current processes user namespace and can not be enabled in the
>>>>>> future in this user namespace.
>>>>>>
>>>>>> A value of "allow" means the segtoups system call is enabled.
>>>>>>
>>>>>> - Descendant user namespaces inherit the value of setgroups from
>>>>>> their parents.
>>>>>>
>>>>>> - A proc file is used (instead of a sysctl) as sysctls
>>>>>> currently do not pass in a struct file so file_ns_capable
>>>>>> is unusable.
>>>>>
>>>>>Reviewed-by: Andy Lutomirski <[email protected]>
>>>>>
>>>>>But I still don't like the name "setgroups". People may look at that
>>>>>and have no clue what the scope of the setting is. And anyone who, as
>>>>>root, writes "deny" to /proc/self/setgroups, thinking that it acts on
>>>>>self, will be in for a surprise.
>>>>
>>>> True setgroups isn't perfect. Documenting it in a manpage may have to be enough. The only real improvement I can think of would be to make the setting a sysctl. But I think pursuing that approaches the point where perfection is the enemy of getting this problem fixed.
>>>>
>>>
>>> Would "userns_setgroups" be okay?
>>
>> Maybe.
>>
>> I just played with this and this is a much bigger booby trap than I had
>> realized. Disabling setgroups disables the possibility of logging in the
>> future and since it is a one way switch the only way out is to reboot.
>>
>> Hooray our software checks the returns of setgroups. Booh. This is a
>> really nasty knob to have anywhere.
>>
>> I need to think about this a little bit. Giving root the power to shoot
>> himself in the foot is one thing. Giving root a loaded gun pointed at
>> his foot with the hammer pulled back, and a sign that says I dare you to
>> pull the trigger, seems like a bad idea.
>>
>> I think I need to reduce when that knob can be used. Grr.
>> Back to the drawing board!
>
> I tried out a bunch of things and finally found a simple rule. Don't
> allow setgroups to be disabled after setgroups has been enabled in a
> user namespace. Or in practical terms don't allow setgroups to be
> disabled after the gid_map has been set.
>
> Which in practice pretty nearly means that we are only allowing writes
> to setgroups when it is a single process and it's eventual children that
> can be affected.
>
> At which point I don't think a name change would make things any
> clearer.

The name change still helps the user to does:

$ ls /proc/self

"setgroups? what's that?"

>
> I have also updated the code to move the permission checks to open
> where they belong (doh!). Patch follows.

Will review and test.

>
> Eric
>
>
>
>
>
>



--
Andy Lutomirski
AMA Capital Management, LLC

2014-12-12 07:04:33

by Chen Hanxiao

[permalink] [raw]
Subject: RE: [CFT] Can I get some Tested-By's on this series?



> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Eric W.
> Biederman
> Sent: Thursday, December 11, 2014 7:20 AM
> To: Richard Weinberger
> Cc: linux-man; Kees Cook; Linux API; Linux Containers; Serge Hallyn; Josh Triplett;
> stable; Andy Lutomirski; Kenton Varda; LSM; Michael Kerrisk-manpages; Casey
> Schaufler; Andrew Morton; [email protected]
> Subject: Re: [CFT] Can I get some Tested-By's on this series?
>
> Richard Weinberger <[email protected]> writes:
>
> > Am 10.12.2014 um 23:48 schrieb Serge Hallyn:
> >> Quoting Eric W. Biederman ([email protected]):
> >>>
> >>> Will people please test these patches with their container project?
> >>>
> >>> These changes break container userspace (hopefully in a minimal way) if
> >>> I could have that confirmed by testing I would really appreciate it. I
> >>> really don't want to send out a bug fix that accidentally breaks
> >>> userspace again.
> >>>
> >>> The only issue sort of under discussion is if there is a better name for
> >>> /proc/<pid>/setgroups, and the name of the file will not affect the
> >>> functionality of the patchset.
> >>>
> >>> With the code reviewed and written in simple obviously correct, easily
> >>> reviewable ways I am hoping/planning to send this to Linus ASAP.
> >>>
> >>> Eric
> >>
> >> Is there a git tree we can clone?
> >
> > I was about to ask that too.
> > Hopefully I can tomorrow find some time for testing.
>
> git pull git.kernel.org:/pub/scm/linux/kernel/git/ebiederm/user-namespace.git
> for-testing
>
> That holds my entire queue of fixes against 3.18-rc6
>

Tested on Fedora20 with libvirt 1.2.11, works fine.

Tested-by: Chen Hanxiao <[email protected]>

Thanks,
- Chen
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-12-13 22:32:14

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [CFT] Can I get some Tested-By's on this series?

sorry, I've only been back from the road the days... Two tries at compiling have failed (infrastructure problems, not your set), hoping to fire of another build tonight.On 12/10/14 16:48 Serge Hallyn wrote:
Quoting Eric W. Biederman ([email protected]):
>
> Will people please test these patches with their container project?
>
> These changes break container userspace (hopefully in a minimal way) if
> I could have that confirmed by testing I would really appreciate it. I
> really don't want to send out a bug fix that accidentally breaks
> userspace again.
>
> The only issue sort of under discussion is if there is a better name for
> /proc/<pid>/setgroups, and the name of the file will not affect the
> functionality of the patchset.
>
> With the code reviewed and written in simple obviously correct, easily
> reviewable ways I am hoping/planning to send this to Linus ASAP.
>
> Eric

Is there a git tree we can clone?

2014-12-15 19:38:48

by Serge Hallyn

[permalink] [raw]
Subject: Re: [CFT] Can I get some Tested-By's on this series?

Quoting Eric W. Biederman ([email protected]):
> St?phane Graber <[email protected]> writes:
>
> > On Fri, Dec 12, 2014 at 03:38:18PM -0600, Eric W. Biederman wrote:
> >> Serge Hallyn <[email protected]> writes:
> >>
> >> > Quoting Eric W. Biederman ([email protected]):
> >> >>
> >> >> Will people please test these patches with their container project?
> >> >>
> >> >> These changes break container userspace (hopefully in a minimal way) if
> >> >> I could have that confirmed by testing I would really appreciate it. I
> >> >> really don't want to send out a bug fix that accidentally breaks
> >> >> userspace again.
> >> >>
> >> >> The only issue sort of under discussion is if there is a better name for
> >> >> /proc/<pid>/setgroups, and the name of the file will not affect the
> >> >> functionality of the patchset.
> >> >>
> >> >> With the code reviewed and written in simple obviously correct, easily
> >> >> reviewable ways I am hoping/planning to send this to Linus ASAP.
> >> >>
> >> >> Eric
> >> >
> >> > Is there a git tree we can clone?
> >>
> >> Have either of you been able to check to see if any of my changes
> >> affects lxc?
> >>
> >> I am trying to gauge how hard and how fast I should push to Linus. lxc
> >> being the largest adopter of unprivileged user namespaces for general
> >> purpose containers.
> >>
> >> I expect you just call newuidmap and newgidmap and don't actually care
> >> about not being able to set gid_map without privilege. But I really
> >> want to avoid pushing a security fix and then being surprised that
> >> things like lxc break.
> >>
> >> Eric
> >
> > Hi Eric,
> >
> > I've unfortunately been pretty busy this week as I was (well, still am)
> > travelling to South Africa for a meeting. I don't have a full kernel
> > tree around here and a full git clone isn't really doable over the kind
> > of Internet I've got here :)
> >
> > Hopefully Serge can give it a quick try, otherwise I should be able to
> > do some tests on Tuesday when I'm back home.
>
> I thought Serge was going to but I haven't heard yet so I am prodding ;-)

Ok, thanks - yes, unprivileged lxc is working fine with your kernels.
Just to be sure I was testing the right thing I also tested using
my unprivileged nsexec testcases, and they failed on setgroup/setgid
as now expected, and succeeded there without your patches.

thanks,
-serge

2014-12-15 20:14:39

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [CFT] Can I get some Tested-By's on this series?

Serge Hallyn <[email protected]> writes:

> Quoting Eric W. Biederman ([email protected]):
>> Stéphane Graber <[email protected]> writes:
>>
>> > On Fri, Dec 12, 2014 at 03:38:18PM -0600, Eric W. Biederman wrote:
>> >> Serge Hallyn <[email protected]> writes:
>> >>
>> >> > Quoting Eric W. Biederman ([email protected]):
>> >> >>
>> >> >> Will people please test these patches with their container project?
>> >> >>
>> >> >> These changes break container userspace (hopefully in a minimal way) if
>> >> >> I could have that confirmed by testing I would really appreciate it. I
>> >> >> really don't want to send out a bug fix that accidentally breaks
>> >> >> userspace again.
>> >> >>
>> >> >> The only issue sort of under discussion is if there is a better name for
>> >> >> /proc/<pid>/setgroups, and the name of the file will not affect the
>> >> >> functionality of the patchset.
>> >> >>
>> >> >> With the code reviewed and written in simple obviously correct, easily
>> >> >> reviewable ways I am hoping/planning to send this to Linus ASAP.
>> >> >>
>> >> >> Eric
>> >> >
>> >> > Is there a git tree we can clone?
>> >>
>> >> Have either of you been able to check to see if any of my changes
>> >> affects lxc?
>> >>
>> >> I am trying to gauge how hard and how fast I should push to Linus. lxc
>> >> being the largest adopter of unprivileged user namespaces for general
>> >> purpose containers.
>> >>
>> >> I expect you just call newuidmap and newgidmap and don't actually care
>> >> about not being able to set gid_map without privilege. But I really
>> >> want to avoid pushing a security fix and then being surprised that
>> >> things like lxc break.
>> >>
>> >> Eric
>> >
>> > Hi Eric,
>> >
>> > I've unfortunately been pretty busy this week as I was (well, still am)
>> > travelling to South Africa for a meeting. I don't have a full kernel
>> > tree around here and a full git clone isn't really doable over the kind
>> > of Internet I've got here :)
>> >
>> > Hopefully Serge can give it a quick try, otherwise I should be able to
>> > do some tests on Tuesday when I'm back home.
>>
>> I thought Serge was going to but I haven't heard yet so I am prodding ;-)
>
> Ok, thanks - yes, unprivileged lxc is working fine with your kernels.
> Just to be sure I was testing the right thing I also tested using
> my unprivileged nsexec testcases, and they failed on setgroup/setgid
> as now expected, and succeeded there without your patches.

Thanks.

Serge unless you object will add your Tested-By to my pull message to Linus.

Minor question do you runprivileged nsexec test cases test to see if the
write to gid_map succeeds? I would have expected the gid_map write to
fail before the setgroups setgid system calls came into play.

Eric

2014-12-15 20:49:46

by Serge Hallyn

[permalink] [raw]
Subject: Re: [CFT] Can I get some Tested-By's on this series?

Quoting Eric W. Biederman ([email protected]):
> Serge Hallyn <[email protected]> writes:
>
> > Quoting Eric W. Biederman ([email protected]):
> >> St?phane Graber <[email protected]> writes:
> >>
> >> > On Fri, Dec 12, 2014 at 03:38:18PM -0600, Eric W. Biederman wrote:
> >> >> Serge Hallyn <[email protected]> writes:
> >> >>
> >> >> > Quoting Eric W. Biederman ([email protected]):
> >> >> >>
> >> >> >> Will people please test these patches with their container project?
> >> >> >>
> >> >> >> These changes break container userspace (hopefully in a minimal way) if
> >> >> >> I could have that confirmed by testing I would really appreciate it. I
> >> >> >> really don't want to send out a bug fix that accidentally breaks
> >> >> >> userspace again.
> >> >> >>
> >> >> >> The only issue sort of under discussion is if there is a better name for
> >> >> >> /proc/<pid>/setgroups, and the name of the file will not affect the
> >> >> >> functionality of the patchset.
> >> >> >>
> >> >> >> With the code reviewed and written in simple obviously correct, easily
> >> >> >> reviewable ways I am hoping/planning to send this to Linus ASAP.
> >> >> >>
> >> >> >> Eric
> >> >> >
> >> >> > Is there a git tree we can clone?
> >> >>
> >> >> Have either of you been able to check to see if any of my changes
> >> >> affects lxc?
> >> >>
> >> >> I am trying to gauge how hard and how fast I should push to Linus. lxc
> >> >> being the largest adopter of unprivileged user namespaces for general
> >> >> purpose containers.
> >> >>
> >> >> I expect you just call newuidmap and newgidmap and don't actually care
> >> >> about not being able to set gid_map without privilege. But I really
> >> >> want to avoid pushing a security fix and then being surprised that
> >> >> things like lxc break.
> >> >>
> >> >> Eric
> >> >
> >> > Hi Eric,
> >> >
> >> > I've unfortunately been pretty busy this week as I was (well, still am)
> >> > travelling to South Africa for a meeting. I don't have a full kernel
> >> > tree around here and a full git clone isn't really doable over the kind
> >> > of Internet I've got here :)
> >> >
> >> > Hopefully Serge can give it a quick try, otherwise I should be able to
> >> > do some tests on Tuesday when I'm back home.
> >>
> >> I thought Serge was going to but I haven't heard yet so I am prodding ;-)
> >
> > Ok, thanks - yes, unprivileged lxc is working fine with your kernels.
> > Just to be sure I was testing the right thing I also tested using
> > my unprivileged nsexec testcases, and they failed on setgroup/setgid
> > as now expected, and succeeded there without your patches.
>
> Thanks.
>
> Serge unless you object will add your Tested-By to my pull message to Linus.

Sounds good.

> Minor question do you runprivileged nsexec test cases test to see if the
> write to gid_map succeeds? I would have expected the gid_map write to
> fail before the setgroups setgid system calls came into play.

Yes, I did that by hand, and it failed (with your kernel).

-serge

2014-12-16 02:06:14

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [CFT] Can I get some Tested-By's on this series?

On Wed, Dec 10, 2014 at 8:39 AM, Eric W. Biederman
<[email protected]> wrote:
>
> Will people please test these patches with their container project?
>
> These changes break container userspace (hopefully in a minimal way) if
> I could have that confirmed by testing I would really appreciate it. I
> really don't want to send out a bug fix that accidentally breaks
> userspace again.
>
> The only issue sort of under discussion is if there is a better name for
> /proc/<pid>/setgroups, and the name of the file will not affect the
> functionality of the patchset.
>
> With the code reviewed and written in simple obviously correct, easily
> reviewable ways I am hoping/planning to send this to Linus ASAP.


I tested this with Sandstorm. It breaks as is and it works if I add
the setgroups thing.

Tested-by: Andy Lutomirski <[email protected]> # breaks things as designed :(

I still don't like the name "setgroups".

--Andy

>
> Eric



--
Andy Lutomirski
AMA Capital Management, LLC

2014-12-16 09:23:38

by Richard Weinberger

[permalink] [raw]
Subject: Re: [CFT] Can I get some Tested-By's on this series?

Am 16.12.2014 um 03:05 schrieb Andy Lutomirski:
> On Wed, Dec 10, 2014 at 8:39 AM, Eric W. Biederman
> <[email protected]> wrote:
>>
>> Will people please test these patches with their container project?
>>
>> These changes break container userspace (hopefully in a minimal way) if
>> I could have that confirmed by testing I would really appreciate it. I
>> really don't want to send out a bug fix that accidentally breaks
>> userspace again.
>>
>> The only issue sort of under discussion is if there is a better name for
>> /proc/<pid>/setgroups, and the name of the file will not affect the
>> functionality of the patchset.
>>
>> With the code reviewed and written in simple obviously correct, easily
>> reviewable ways I am hoping/planning to send this to Linus ASAP.
>
>
> I tested this with Sandstorm. It breaks as is and it works if I add
> the setgroups thing.
>
> Tested-by: Andy Lutomirski <[email protected]> # breaks things as designed :(
>
> I still don't like the name "setgroups".

I agree that the name is not optimal.
But I don't have a counterproposal as my naming skills are miserable.

Thanks,
//richard