2021-05-10 13:07:28

by Giuseppe Scrivano

[permalink] [raw]
Subject: [RFC PATCH 0/3] new mode 'shadow' for /proc/PID/setgroups

This series is based on some old patches I've been playing with some
years ago, but they were never sent to lkml as I was not sure about
their complexity/usefulness ratio. It was recently reported by
another user that these patches are still useful[1] so I am submitting
the last version and see what other folks think about this feature.

Since the fix for CVE-2014-8989 in order to set a gids mapping for a
user namespace when the user namespace owner doesn't have CAP_SETGID
in its parent, it is necessary to first disable setgroups(2) through
/proc/PID/setgroups.

Setting up a user namespace with multiple IDs mapped into is usually
done through the privileged helpers newuidmap/newgidmap.
Since these helpers run either as setuid or with CAP_SET[U,G]ID file
capabilities, it is not necessary to disable setgroups(2) in the
created user namespace. The user running in the user namespace can
use setgroups(2) and drop the additional groups that it had initially.

This is still an issue on systems where negative groups ACLs, i.e. the
group permissions are more restrictive than the entry for the other
categories, are used. With such configuration, allowing setgroups(2)
would cause the same security vulnerability described by
CVE-2014-8989.

This patchset adds a new 'shadow' mode for the /proc/PID/setgroups
file. It permits to safely enable setgroups also when negative groups
ACLs are used.
When the 'shadow' mode is written to /proc/PID/setgroups, the
current process groups are stored into the user namespace and they
will be silently added on each setgroups(2) call. A child user
namespace won't be able to drop these groups anymore.

To fully take advantage of this feature, newgidmap will also need to
learn about the 'shadow' mode. An idea is that when the system or the
user are using negative groups ACLs, then newgidmap needs to check
that /proc/PID/setgroups is set to 'deny' or 'shadow' before allowing a
mapping. The configuration for negative groups ACLs can either be a
system wide or per user setting.

[1] https://lore.kernel.org/lkml/[email protected]/T/#mc53e0fc80203b8209a8836d5861a267ce22c5c0f

Giuseppe Scrivano (3):
setgroups: new mode 'shadow' for /proc/PID/setgroups
getgroups: hide unknown groups
proc: hide unknown groups in status

fs/proc/array.c | 12 +++-
include/linux/cred.h | 4 +-
include/linux/user_namespace.h | 11 +++-
kernel/groups.c | 100 +++++++++++++++++++++------------
kernel/uid16.c | 90 ++++++++++++++++++-----------
kernel/user_namespace.c | 34 +++++++++--
6 files changed, 169 insertions(+), 82 deletions(-)

--
2.31.1


2021-05-10 13:15:26

by Giuseppe Scrivano

[permalink] [raw]
Subject: [RFC PATCH 3/3] proc: hide unknown groups in status

when the "shadow" mode is enabled for the user namespace, do not copy
to userspace the groups that are not mapped.

Signed-off-by: Giuseppe Scrivano <[email protected]>
---
fs/proc/array.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 7ec59171f197..81dc733773d4 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -202,9 +202,17 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,

seq_puts(m, "\nGroups:\t");
group_info = cred->group_info;
- for (g = 0; g < group_info->ngroups; g++)
+ for (g = 0; g < group_info->ngroups; g++) {
+ gid_t gid = from_kgid(user_ns, group_info->gid[g]);
+
+ if (gid == (gid_t)-1) {
+ if (user_ns->flags & USERNS_SETGROUPS_SHADOW)
+ continue;
+ gid = overflowgid;
+ }
seq_put_decimal_ull(m, g ? " " : "",
- from_kgid_munged(user_ns, group_info->gid[g]));
+ gid);
+ }
put_cred(cred);
/* Trailing space shouldn't have been added in the first place. */
seq_putc(m, ' ');
--
2.31.1

2021-05-10 13:15:38

by Giuseppe Scrivano

[permalink] [raw]
Subject: [RFC PATCH 2/3] getgroups: hide unknown groups

do not copy to userspace the groups that are not known in the
namespace.

Signed-off-by: Giuseppe Scrivano <[email protected]>
---
kernel/groups.c | 40 +++++++++++++++++++--------------------
kernel/uid16.c | 50 ++++++++++++++++++++++++-------------------------
2 files changed, 44 insertions(+), 46 deletions(-)

diff --git a/kernel/groups.c b/kernel/groups.c
index f0c3b49da19e..97fa9faa7813 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -35,19 +35,30 @@ EXPORT_SYMBOL(groups_free);

/* export the group_info to a user-space array */
static int groups_to_user(gid_t __user *grouplist,
- const struct group_info *group_info)
+ const struct group_info *group_info,
+ int gidsetsize)
{
struct user_namespace *user_ns = current_user_ns();
- int i;
unsigned int count = group_info->ngroups;
+ int i, ngroups = 0;

for (i = 0; i < count; i++) {
gid_t gid;
- gid = from_kgid_munged(user_ns, group_info->gid[i]);
- if (put_user(gid, grouplist+i))
- return -EFAULT;
+ gid = from_kgid(user_ns, group_info->gid[i]);
+ if (gid == (gid_t)-1) {
+ if (user_ns->flags & USERNS_SETGROUPS_SHADOW)
+ continue;
+ gid = overflowgid;
+ }
+ if (gidsetsize) {
+ if (ngroups == gidsetsize)
+ return -EINVAL;
+ if (put_user(gid, grouplist + ngroups))
+ return -EFAULT;
+ }
+ ngroups++;
}
- return 0;
+ return ngroups;
}

/* fill a group_info from a user-space array - it must be allocated already */
@@ -146,26 +157,13 @@ EXPORT_SYMBOL(set_current_groups);

SYSCALL_DEFINE2(getgroups, int, gidsetsize, gid_t __user *, grouplist)
{
+ /* no need to grab task_lock here; it cannot change */
const struct cred *cred = current_cred();
- int i;

if (gidsetsize < 0)
return -EINVAL;

- /* no need to grab task_lock here; it cannot change */
- i = cred->group_info->ngroups;
- if (gidsetsize) {
- if (i > gidsetsize) {
- i = -EINVAL;
- goto out;
- }
- if (groups_to_user(grouplist, cred->group_info)) {
- i = -EFAULT;
- goto out;
- }
- }
-out:
- return i;
+ return groups_to_user(grouplist, cred->group_info, gidsetsize);
}

bool may_setgroups(struct group_info **shadowed_groups)
diff --git a/kernel/uid16.c b/kernel/uid16.c
index cb1110f083ce..6f2dc793b5f8 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -112,21 +112,33 @@ SYSCALL_DEFINE1(setfsgid16, old_gid_t, gid)
}

static int groups16_to_user(old_gid_t __user *grouplist,
- struct group_info *group_info)
+ struct group_info *group_info,
+ int gidsetsize)
{
struct user_namespace *user_ns = current_user_ns();
- int i;
- old_gid_t group;
- kgid_t kgid;
+ unsigned int count = group_info->ngroups;
+ int i, ngroups = 0;

- for (i = 0; i < group_info->ngroups; i++) {
- kgid = group_info->gid[i];
- group = high2lowgid(from_kgid_munged(user_ns, kgid));
- if (put_user(group, grouplist+i))
- return -EFAULT;
+ for (i = 0; i < count; i++) {
+ gid_t gid;
+ old_gid_t group;
+
+ gid = from_kgid(user_ns, group_info->gid[i]);
+ if (gid == (gid_t)-1) {
+ if (user_ns->flags & USERNS_SETGROUPS_SHADOW)
+ continue;
+ gid = overflowgid;
+ }
+ group = high2lowgid(gid);
+ if (gidsetsize) {
+ if (ngroups == gidsetsize)
+ return -EINVAL;
+ if (put_user(group, grouplist + ngroups))
+ return -EFAULT;
+ }
+ ngroups++;
}
-
- return 0;
+ return ngroups;
}

static int groups16_from_user(struct group_info *group_info,
@@ -153,25 +165,13 @@ static int groups16_from_user(struct group_info *group_info,

SYSCALL_DEFINE2(getgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
{
+ /* no need to grab task_lock here; it cannot change */
const struct cred *cred = current_cred();
- int i;

if (gidsetsize < 0)
return -EINVAL;

- i = cred->group_info->ngroups;
- if (gidsetsize) {
- if (i > gidsetsize) {
- i = -EINVAL;
- goto out;
- }
- if (groups16_to_user(grouplist, cred->group_info)) {
- i = -EFAULT;
- goto out;
- }
- }
-out:
- return i;
+ return groups16_to_user(grouplist, cred->group_info, gidsetsize);
}

SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
--
2.31.1

2021-05-10 16:09:22

by Snaipe

[permalink] [raw]
Subject: [RFC PATCH 0/3] new mode 'shadow' for /proc/PID/setgroups

"Giuseppe Scrivano" <[email protected]> writes:
> This series is based on some old patches I've been playing with some
> years ago, but they were never sent to lkml as I was not sure about
> their complexity/usefulness ratio. It was recently reported by
> another user that these patches are still useful[1] so I am submitting
> the last version and see what other folks think about this feature.

For context, the reason why these patches are useful to us is that our
use-case of user namespaces includes running executables within an
{u,g}id space controlled by the calling user, while still remaining in
the original root filesystem.

For example, we use user namespaces through one of our tools[1] as a
substitute to fakeroot in order to build software that would otherwise
need root permission to package, like sudo or ping, where setting the
setuid bit or more importantly file capabilities are necessary.

In these use-cases, still respecting the original group membership is
actually desired. Not just because of the negative-access permission
issues, but because it's possible to lose legitimate access to files
if using setgroups while holding membership of unmapped GIDs. This can
be very surprising behaviour, especially when, as an example, the caller
suddenly loses access to their current working directory after entering
a user namespace and using setgroups.

I've seen other solutions to the original problem mentioned, like
introducing a new sysctl to convey that the system does not use negative-
access permissions -- I believe these alternate solutions do not solve
my second point about losing legitimate access, while this patchset does.
I've tested an older version of these patches and they have all of the
desired properties:

$ id
uid=1000(snaipe) gid=1000(snaipe) groups=1000(snaipe),998(wheel)

$ bst grep . /proc/self/uid_map /proc/self/gid_map /proc/self/setgroups
/proc/self/uid_map: 0 1000 1
/proc/self/uid_map: 1 100000 65536
/proc/self/gid_map: 0 1000 1
/proc/self/gid_map: 1 100000 65536
/proc/self/setgroups:shadow

$ ls -l
total 8
drwxr-xr-x 2 root wheel 4096 Apr 23 14:18 allowed
drwx---r-x 2 root wheel 4096 Apr 23 14:18 denied

$ bst sh -c 'id; ls allowed denied'
uid=0(root) gid=0(root) groups=0(root)
allowed:
ls: cannot open directory 'denied': Permission denied

$ bst --groups 1 sh -c 'id; ls allowed denied'
uid=0(root) gid=0(root) groups=0(root),1(daemon)
allowed:
ls: cannot open directory 'denied': Permission denied

Ultimately, we want to make it safe to run our tool as an unprivileged user,
and while we're currently riding the status-quo of "safe-to-use-but-not-
if-you're-using-negative-permissions", having a way for us to do the right
thing -- without relying on the gotcha that a system administrator must
configure a system knob to make it safe -- is quite attractive.

[1]: https://github.com/aristanetworks/bst

--
Snaipe

2021-05-15 16:31:12

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] proc: hide unknown groups in status

On Mon, May 10, 2021 at 03:00:11PM +0200, Giuseppe Scrivano wrote:
> when the "shadow" mode is enabled for the user namespace, do not copy
> to userspace the groups that are not mapped.
>
> Signed-off-by: Giuseppe Scrivano <[email protected]>

Reviewed-by: Serge Hallyn <[email protected]>

> ---
> fs/proc/array.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index 7ec59171f197..81dc733773d4 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -202,9 +202,17 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
>
> seq_puts(m, "\nGroups:\t");
> group_info = cred->group_info;
> - for (g = 0; g < group_info->ngroups; g++)
> + for (g = 0; g < group_info->ngroups; g++) {
> + gid_t gid = from_kgid(user_ns, group_info->gid[g]);
> +
> + if (gid == (gid_t)-1) {
> + if (user_ns->flags & USERNS_SETGROUPS_SHADOW)
> + continue;
> + gid = overflowgid;
> + }
> seq_put_decimal_ull(m, g ? " " : "",
> - from_kgid_munged(user_ns, group_info->gid[g]));
> + gid);
> + }
> put_cred(cred);
> /* Trailing space shouldn't have been added in the first place. */
> seq_putc(m, ' ');
> --
> 2.31.1

2021-05-15 16:32:30

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [RFC PATCH 2/3] getgroups: hide unknown groups

On Mon, May 10, 2021 at 03:00:10PM +0200, Giuseppe Scrivano wrote:
> do not copy to userspace the groups that are not known in the
> namespace.
>
> Signed-off-by: Giuseppe Scrivano <[email protected]>

Reviewed-by: Serge Hallyn <[email protected]>

> ---
> kernel/groups.c | 40 +++++++++++++++++++--------------------
> kernel/uid16.c | 50 ++++++++++++++++++++++++-------------------------
> 2 files changed, 44 insertions(+), 46 deletions(-)
>
> diff --git a/kernel/groups.c b/kernel/groups.c
> index f0c3b49da19e..97fa9faa7813 100644
> --- a/kernel/groups.c
> +++ b/kernel/groups.c
> @@ -35,19 +35,30 @@ EXPORT_SYMBOL(groups_free);
>
> /* export the group_info to a user-space array */
> static int groups_to_user(gid_t __user *grouplist,
> - const struct group_info *group_info)
> + const struct group_info *group_info,
> + int gidsetsize)
> {
> struct user_namespace *user_ns = current_user_ns();
> - int i;
> unsigned int count = group_info->ngroups;
> + int i, ngroups = 0;
>
> for (i = 0; i < count; i++) {
> gid_t gid;
> - gid = from_kgid_munged(user_ns, group_info->gid[i]);
> - if (put_user(gid, grouplist+i))
> - return -EFAULT;
> + gid = from_kgid(user_ns, group_info->gid[i]);
> + if (gid == (gid_t)-1) {
> + if (user_ns->flags & USERNS_SETGROUPS_SHADOW)
> + continue;
> + gid = overflowgid;
> + }
> + if (gidsetsize) {
> + if (ngroups == gidsetsize)
> + return -EINVAL;
> + if (put_user(gid, grouplist + ngroups))
> + return -EFAULT;
> + }
> + ngroups++;
> }
> - return 0;
> + return ngroups;
> }
>
> /* fill a group_info from a user-space array - it must be allocated already */
> @@ -146,26 +157,13 @@ EXPORT_SYMBOL(set_current_groups);
>
> SYSCALL_DEFINE2(getgroups, int, gidsetsize, gid_t __user *, grouplist)
> {
> + /* no need to grab task_lock here; it cannot change */
> const struct cred *cred = current_cred();
> - int i;
>
> if (gidsetsize < 0)
> return -EINVAL;
>
> - /* no need to grab task_lock here; it cannot change */
> - i = cred->group_info->ngroups;
> - if (gidsetsize) {
> - if (i > gidsetsize) {
> - i = -EINVAL;
> - goto out;
> - }
> - if (groups_to_user(grouplist, cred->group_info)) {
> - i = -EFAULT;
> - goto out;
> - }
> - }
> -out:
> - return i;
> + return groups_to_user(grouplist, cred->group_info, gidsetsize);
> }
>
> bool may_setgroups(struct group_info **shadowed_groups)
> diff --git a/kernel/uid16.c b/kernel/uid16.c
> index cb1110f083ce..6f2dc793b5f8 100644
> --- a/kernel/uid16.c
> +++ b/kernel/uid16.c
> @@ -112,21 +112,33 @@ SYSCALL_DEFINE1(setfsgid16, old_gid_t, gid)
> }
>
> static int groups16_to_user(old_gid_t __user *grouplist,
> - struct group_info *group_info)
> + struct group_info *group_info,
> + int gidsetsize)
> {
> struct user_namespace *user_ns = current_user_ns();
> - int i;
> - old_gid_t group;
> - kgid_t kgid;
> + unsigned int count = group_info->ngroups;
> + int i, ngroups = 0;
>
> - for (i = 0; i < group_info->ngroups; i++) {
> - kgid = group_info->gid[i];
> - group = high2lowgid(from_kgid_munged(user_ns, kgid));
> - if (put_user(group, grouplist+i))
> - return -EFAULT;
> + for (i = 0; i < count; i++) {
> + gid_t gid;
> + old_gid_t group;
> +
> + gid = from_kgid(user_ns, group_info->gid[i]);
> + if (gid == (gid_t)-1) {
> + if (user_ns->flags & USERNS_SETGROUPS_SHADOW)
> + continue;
> + gid = overflowgid;
> + }
> + group = high2lowgid(gid);
> + if (gidsetsize) {
> + if (ngroups == gidsetsize)
> + return -EINVAL;
> + if (put_user(group, grouplist + ngroups))
> + return -EFAULT;
> + }
> + ngroups++;
> }
> -
> - return 0;
> + return ngroups;
> }
>
> static int groups16_from_user(struct group_info *group_info,
> @@ -153,25 +165,13 @@ static int groups16_from_user(struct group_info *group_info,
>
> SYSCALL_DEFINE2(getgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
> {
> + /* no need to grab task_lock here; it cannot change */
> const struct cred *cred = current_cred();
> - int i;
>
> if (gidsetsize < 0)
> return -EINVAL;
>
> - i = cred->group_info->ngroups;
> - if (gidsetsize) {
> - if (i > gidsetsize) {
> - i = -EINVAL;
> - goto out;
> - }
> - if (groups16_to_user(grouplist, cred->group_info)) {
> - i = -EFAULT;
> - goto out;
> - }
> - }
> -out:
> - return i;
> + return groups16_to_user(grouplist, cred->group_info, gidsetsize);
> }
>
> SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
> --
> 2.31.1

2021-05-21 20:21:23

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] new mode 'shadow' for /proc/PID/setgroups

Giuseppe Scrivano <[email protected]> writes:

> This series is based on some old patches I've been playing with some
> years ago, but they were never sent to lkml as I was not sure about
> their complexity/usefulness ratio. It was recently reported by
> another user that these patches are still useful[1] so I am submitting
> the last version and see what other folks think about this feature.
>
> Since the fix for CVE-2014-8989 in order to set a gids mapping for a
> user namespace when the user namespace owner doesn't have CAP_SETGID
> in its parent, it is necessary to first disable setgroups(2) through
> /proc/PID/setgroups.
>
> Setting up a user namespace with multiple IDs mapped into is usually
> done through the privileged helpers newuidmap/newgidmap.
> Since these helpers run either as setuid or with CAP_SET[U,G]ID file
> capabilities, it is not necessary to disable setgroups(2) in the
> created user namespace. The user running in the user namespace can
> use setgroups(2) and drop the additional groups that it had initially.
>
> This is still an issue on systems where negative groups ACLs, i.e. the
> group permissions are more restrictive than the entry for the other
> categories, are used. With such configuration, allowing setgroups(2)
> would cause the same security vulnerability described by
> CVE-2014-8989.

Do you have any experience or any documentation about systems that are
using groups to deny access?

There are some deployments somewhere, but last I looked they were rare
enough that the intersection between systems using groups to deny access
and systems deploying containers could reasonably be assumed to be the
empty set?

Before we seriously consider merging a change like this I believe we
need some references to actual deployed systems. As adding a feature
that is designed around a premise of a security model that people
are not using will likely lead to poor testing, poor review and
not enough feedback to get the rough edges off.


I suspect systems need to preserve some set of groups released from
the restriction of needing to preserve negative groups could result
in a very different result.

Eric

2021-05-24 13:43:46

by Giuseppe Scrivano

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] new mode 'shadow' for /proc/PID/setgroups

[email protected] (Eric W. Biederman) writes:

> Giuseppe Scrivano <[email protected]> writes:
>
>> This series is based on some old patches I've been playing with some
>> years ago, but they were never sent to lkml as I was not sure about
>> their complexity/usefulness ratio. It was recently reported by
>> another user that these patches are still useful[1] so I am submitting
>> the last version and see what other folks think about this feature.
>>
>> Since the fix for CVE-2014-8989 in order to set a gids mapping for a
>> user namespace when the user namespace owner doesn't have CAP_SETGID
>> in its parent, it is necessary to first disable setgroups(2) through
>> /proc/PID/setgroups.
>>
>> Setting up a user namespace with multiple IDs mapped into is usually
>> done through the privileged helpers newuidmap/newgidmap.
>> Since these helpers run either as setuid or with CAP_SET[U,G]ID file
>> capabilities, it is not necessary to disable setgroups(2) in the
>> created user namespace. The user running in the user namespace can
>> use setgroups(2) and drop the additional groups that it had initially.
>>
>> This is still an issue on systems where negative groups ACLs, i.e. the
>> group permissions are more restrictive than the entry for the other
>> categories, are used. With such configuration, allowing setgroups(2)
>> would cause the same security vulnerability described by
>> CVE-2014-8989.
>
> Do you have any experience or any documentation about systems that are
> using groups to deny access?
>
> There are some deployments somewhere, but last I looked they were rare
> enough that the intersection between systems using groups to deny access
> and systems deploying containers could reasonably be assumed to be the
> empty set?
>
> Before we seriously consider merging a change like this I believe we
> need some references to actual deployed systems. As adding a feature
> that is designed around a premise of a security model that people
> are not using will likely lead to poor testing, poor review and
> not enough feedback to get the rough edges off.

Snaipe (added to CC) has raised this point some weeks ago. Snaipe, do
you have any more information to share on what systems are using user
namespaces and deny access through groups?

Giuseppe