LinuxLists.cc - [RFC PATCH 0/3] System call to switch user credentials

2013-10-16 22:02:36

Subject: [RFC PATCH 0/3] System call to switch user credentials

This system call implementation is the result of discussions at LSF
this last Feb about providing better kernel support to user mode file
servers.

Our use case is an NFS+pNFS+9P file server in user space. We have to
switch user credentials for a number of operations such as CREATE,
MKDIR, and WRITE. We currently use setfsuid(), setfsgid(), and setgroups()
for each of these calls followed by the same set of syscalls to revert
to root privileges. We also must do a setcap to disable root privs so that
quotas and access checks work properly. This results in a minimum of
7 system calls for each affected filesystem operation. Each syscall
in this set creates a new creds object with its associated RCU resources.

Knfsd calls nfsd_setuser() to do the same thing in one call.

This system call does the same function as nfsd_setuser() but for user space.
It replaces the six system calls with just two and uses RCU more efficiently
by only doing it once. This is done using the following struct which
combines all of the arguments that are passed but the corrent syscalls
and passes them to the system call.

struct user_creds {
uid_t uid;
gid_t gid;
unsigned ngroups;
gid_t altgroups[0];
};

Inside our server, we have implemented two functions to manage credentials
using a local user identity cache that has the following structure.
The req_ctx contains two members of interest, a pointer to the credentials
that are constructed by the server from the protocol and a file descriptor
which is described below. The rest of the structure is housekeeping for
the user identity cache.

struct req_ctx {
/* other stuff like avl tree links */
struct user_creds *creds;
int creds_fd;
};

The typical sequence for a protocol operation that creates or morphs
an object in the filesystem is:

ctx = get_ctx(me->uid);
become_client(ctx);
/* mkdir, mknod, write as client user */
restore_creds();

I have left out error handling to simplify the flow. This replaces the
current setfsuid();setfsgid();setgroups(); before and after. get_ctx()
does a lookup in the cache.

The become_client function uses two forms of the system call. We will
take the second first. The SWCREDS_FSIDS command creates a new creds
for the task and fills it from the user_creds argument. It also clears
the set of root capabilities in the effective capabilities. This is
functionally equivalent to what nfsd_setuser() does for knfsd.

We currently set the reduced capabilities globally in order to keep the
overhead down. This system call does this per call, leaving the rest
of the server with full capabilities.

This version of the system call opens an anonymous file and returns
an fd for it. This fd is useless for I/O but it does allow
us to cache creds cheaply. We close the file when we purge the cache
entry.

The first form uses the SWCREDS_FROMFD command and the appropriate fd
that was returned for this client user earlier. The creds referenced
by the fd are used to override_creds the task's effective creds. Actually,
any open file will do but the opened anonymous file is the least
overhead because all it consumes is a filp and fd slot. The override_creds
does not consume any RCU resources so it is much faster and consumes
fewer resources.

int become_client(struct req_ctx *ctx)
{
int ret;

if (ctx->creds_fd >= 0) {
ret = switch_creds(SWCREDS_FROMFD, ctx->creds_fd);
if (ret < 0) {
perror("become_client failed!");
return ret;
}
} else {
ret = switch_creds(SWCREDS_FSIDS, (unsigned long)ctx->creds);
if (ret < 0) {
perror("become_client with creds failed!");
return ret;
} else {
fprintf(stderr, "New client: uid= %d, fd = %d\n",
ctx->creds->uid, ret);
ctx->creds_fd = ret;
}
}
return 0;
}

The restore_creds function simply uses the SWCREDS_REVERT command which
restores the task's real creds. This is the safest route in our code
but one could also switch directly to another set safely.

int restore_creds(void)
{
int ret;

ret = switch_creds(SWCREDS_REVERT, 0);
if (ret < 0) {
perror("switch_creds back failed!\n");
return ret;
}
return 0;
}

The first patch implements the system call itself. The second two add
the syscall linkage for X86 and X86_64. I chose the next available
numbers for those architectures as of 3.12-RC5. I added these patches as a
temporary bridge until official numbers are assigned. I have also not
added entries for other architectures but there is nothing architecturally
dependent in this syscall so when appropriate, numbers can be assigned.

Please review and comment to me. The code fragments above are from my
test program.

Regards,

Jim Lieb
NFS Ganesha project

2013-10-16 22:02:39

by Jim Lieb

[permalink] [raw]

Subject: [PATCH 3/3] switch_creds: Assign x86_64 syscall number for switch_creds

This is a temporary while waiting for syscall number assignment.

Signed-off-by: Jim Lieb <[email protected]>
---
arch/x86/syscalls/syscall_64.tbl | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 38ae65d..f46b75c 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -320,6 +320,7 @@
311 64 process_vm_writev sys_process_vm_writev
312 common kcmp sys_kcmp
313 common finit_module sys_finit_module
+314 common switch_creds sys_switch_creds

#
# x32-specific system call numbers start at 512 to avoid cache impact
--
1.8.3.1

2013-10-16 22:02:57

by Jim Lieb

[permalink] [raw]

Subject: [PATCH 2/3] switch_creds: Add x86 syscall number

This is temporary number awaiting syscall number assignment.

Signed-off-by: Jim Lieb <[email protected]>
---
arch/x86/syscalls/syscall_32.tbl | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index aabfb83..b836839 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -357,3 +357,4 @@
348 i386 process_vm_writev sys_process_vm_writev compat_sys_process_vm_writev
349 i386 kcmp sys_kcmp
350 i386 finit_module sys_finit_module
+351 i386 switch_creds sys_switch_creds
--
1.8.3.1

2013-10-16 22:03:26

by Jim Lieb

[permalink] [raw]

Subject: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops

File servers must do some operations with the credentials of
their client. This syscall switches the key credentials similar
to nfsd_setuser() in fs/nfsd/auth.c with the capability of retaining a
handle to the credentials by way of an fd for an open anonymous file.
This makes switching for subsequent operations for that client more efficient.

Signed-off-by: Jim Lieb <[email protected]>
---
include/linux/cred.h | 15 ++++
include/linux/syscalls.h | 2 +
kernel/sys.c | 175 +++++++++++++++++++++++++++++++++++++++++++++++
kernel/sys_ni.c | 3 +
4 files changed, 195 insertions(+)

diff --git a/include/linux/cred.h b/include/linux/cred.h
index 04421e8..66a006e 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -138,6 +138,21 @@ struct cred {
struct rcu_head rcu; /* RCU deletion hook */
};

+/* switch_creds() syscall parameters*/
+
+#define SWCREDS_REVERT 0
+#define SWCREDS_FROMFD 1
+#define SWCREDS_FSIDS 2
+
+/* arg for SWCREDS_FSIDS command */
+
+struct user_creds {
+ uid_t uid;
+ gid_t gid;
+ unsigned ngroups;
+ gid_t altgroups[0];
+};
+
extern void __put_cred(struct cred *);
extern void exit_creds(struct task_struct *);
extern int copy_creds(struct task_struct *, unsigned long);
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 7fac04e..3550a8f 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -64,6 +64,7 @@ struct old_linux_dirent;
struct perf_event_attr;
struct file_handle;
struct sigaltstack;
+struct user_creds;

#include <linux/types.h>
#include <linux/aio_abi.h>
@@ -231,6 +232,7 @@ asmlinkage long sys_setresuid(uid_t ruid, uid_t euid, uid_t suid);
asmlinkage long sys_setresgid(gid_t rgid, gid_t egid, gid_t sgid);
asmlinkage long sys_setfsuid(uid_t uid);
asmlinkage long sys_setfsgid(gid_t gid);
+asmlinkage long sys_switch_creds(int cmd, unsigned long arg);
asmlinkage long sys_setpgid(pid_t pid, pid_t pgid);
asmlinkage long sys_setsid(void);
asmlinkage long sys_setgroups(int gidsetsize, gid_t __user *grouplist);
diff --git a/kernel/sys.c b/kernel/sys.c
index c18ecca..cebc661 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -53,6 +53,7 @@
#include <linux/rcupdate.h>
#include <linux/uidgid.h>
#include <linux/cred.h>
+#include <linux/anon_inodes.h>

#include <linux/kmsg_dump.h>
/* Move somewhere else to avoid recompiling? */
@@ -798,6 +799,180 @@ change_okay:
return old_fsgid;
}

+static int swcreds_release(struct inode *inode, struct file *file)
+{
+ return 0;
+}
+
+#ifdef CONFIG_PROC_FS
+static int swcreds_show_fdinfo(struct seq_file *m, struct file *f)
+{
+ const struct group_info *group_info = f->f_cred->group_info;
+ int ret;
+
+ ret = seq_printf(m, "setcreds: fsuid = %d, fsgid = %d, ngroups = %d\n",
+ from_kuid_munged(current_user_ns(),
+ f->f_cred->fsuid),
+ from_kgid_munged(current_user_ns(),
+ f->f_cred->fsgid),
+ group_info->ngroups);
+ if (!ret) {
+ int i;
+
+ for (i = 0; i < f->f_cred->group_info->ngroups; i++) {
+ ret = seq_printf(m, "altgroup[%d] = %d\n",
+ i,
+ from_kgid_munged(current_user_ns(),
+ GROUP_AT(group_info,
+ i)));
+ if (ret)
+ break;
+ }
+ }
+ return ret;
+}
+#endif
+
+static const struct file_operations swcreds_fops = {
+#ifdef CONFIG_PROC_FS
+ .show_fdinfo = swcreds_show_fdinfo,
+#endif
+ .release = swcreds_release
+};
+
+/* do_switch_creds - set thread creds from struct pointed to by arg
+ *
+ * Does setfsuid, setfsgid, setgroups for thread in one call and one rcu update.
+ * drop special root capabilities as well which are not dropped by setfsuid.
+ * This code is similar to nfsd_setuid in concept.
+ */
+
+static int do_switch_creds(unsigned long arg)
+{
+ const struct user_creds __user *creds;
+ const gid_t __user *alt_groups;
+ const struct cred *old = current_cred();
+ int i, new_fd;
+ struct cred *new;
+ struct user_creds ncred;
+ struct group_info *group_info;
+ int retval;
+
+ creds = (const struct user_creds __user *)arg;
+ /* validate and copyin creds */
+ if (copy_from_user(&ncred, creds, sizeof(struct user_creds)))
+ return -EFAULT;
+
+ if (ncred.ngroups > NGROUPS_MAX)
+ return -EINVAL;
+ new = prepare_creds();
+ if (!new)
+ return -ENOMEM;
+ group_info = groups_alloc(ncred.ngroups);
+ if (!group_info) {
+ abort_creds(new);
+ return -ENOMEM;
+ }
+ new->fsuid = make_kuid(old->user_ns, ncred.uid);
+ new->fsgid = make_kgid(old->user_ns, ncred.gid);
+ alt_groups = &creds->altgroups[0];
+ for (i = 0; i < ncred.ngroups; i++) {
+ kgid_t kgid;
+ gid_t gid;
+
+ if (get_user(gid, alt_groups+i)) {
+ retval = -EFAULT;
+ goto bad_altgrp;
+ }
+ kgid = make_kgid(old->user_ns, gid);
+ if (!gid_valid(kgid)) {
+ retval = -EINVAL;
+ goto bad_altgrp;
+ }
+ GROUP_AT(group_info, i) = kgid;
+ }
+ retval = set_groups(new, group_info);
+ put_group_info(group_info);
+ if (retval < 0) {
+ abort_creds(new);
+ return retval;
+ }
+ /* We need to be the real user in capabilities as well.
+ * don't leak my root caps into the mix */
+ new->cap_effective = cap_drop_nfsd_set(new->cap_effective);
+ old = override_creds(new);
+ /* now open a anon file to preserve these creds
+ * and return the fd
+ */
+ new_fd = anon_inode_getfd("[switch_creds]",
+ &swcreds_fops,
+ NULL,
+ (O_RDONLY|O_CLOEXEC));
+ if (new_fd < 0) {
+ revert_creds(old);
+ abort_creds(new);
+ }
+ return new_fd;
+
+bad_altgrp:
+ put_group_info(group_info);
+ abort_creds(new);
+ return retval;
+}
+
+/**
+ * switch_creds -- Change user (client) file system credentials
+ *
+ * Switch the thread's current file access credentials from the argument.
+ * The end result is the thread has the fsuid, fsgid, altgroups and
+ * non-root capabilities of the user we want to be for subsequent syscalls.
+ *
+ * @cmd SWCREDS_REVERT revert thread's creds to its real creds.
+ * Always returns 0
+ * SWCREDS_FROM_FD override current creds with creds from open file.
+ * Returns 0 or file related error.
+ * SWCREDS_FSIDS set fsuid, fsgid, altgroups from arg
+ * Return fd or error
+ * @arg fd or pointer to creds struct
+ *
+ * The returned fd is a somewhat useless but minimally resource consuming
+ * anon file that can only be used to store the new creds state.
+ *
+ * Return 0, fd, or error.
+ */
+
+SYSCALL_DEFINE2(switch_creds, int, cmd, unsigned long, arg)
+{
+ const struct cred *old;
+
+ old = current_cred();
+ if (!ns_capable(old->user_ns, CAP_SETGID) ||
+ !ns_capable(old->user_ns, CAP_SETUID))
+ return -EPERM;
+
+ if (cmd == SWCREDS_REVERT) {
+ if (current->real_cred != current->cred) {
+ revert_creds(current->real_cred);
+ return 0;
+ }
+ }
+ if (cmd == SWCREDS_FROMFD) {
+ struct fd f;
+ const struct cred *file_cred;
+
+ f = fdget(arg);
+ if (unlikely(!f.file))
+ return -EBADF;
+ file_cred = f.file->f_cred;
+ (void)override_creds(file_cred);
+ fdput(f);
+ return 0;
+ } else if (cmd == SWCREDS_FSIDS) {
+ return do_switch_creds(arg);
+ }
+ return -EINVAL;
+}
+
/**
* sys_getpid - return the thread group id of the current process
*
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 7078052..7573cad 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -209,3 +209,6 @@ cond_syscall(compat_sys_open_by_handle_at);

/* compare kernel pointers */
cond_syscall(sys_kcmp);
+
+/* switch task creds */
+cond_syscall(sys_switch_creds);
--
1.8.3.1

2013-10-16 22:42:55

by Al Viro

[permalink] [raw]

Subject: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops

On Wed, Oct 16, 2013 at 03:01:57PM -0700, Jim Lieb wrote:
> File servers must do some operations with the credentials of
> their client. This syscall switches the key credentials similar
> to nfsd_setuser() in fs/nfsd/auth.c with the capability of retaining a
> handle to the credentials by way of an fd for an open anonymous file.
> This makes switching for subsequent operations for that client more efficient.

Yet Another Untyped Multiplexor. Inna bun. Onna stick.
CMOT Dibbler special...

Switching creds to those of opener of given file descriptor
is fine, but in any realistic situation you'll get all the real win
from that - you should cache those fds (which you seem to do), and
then setuid/etc. is done once per cache miss. Making the magical
"set them all at once" mess (complete with non-trivial structure,
32/64bit compat, etc.) pointless. Moreover, you don't need any magic
files at all - just set the creds and open /dev/null and there's your fd.
With proper creds associated with it. While we are at it, just _start_
with opening /dev/null. With your initial creds. Voila - revert is
simply switch to that fd's creds.

IOW, you really need only one syscall:

SYSCALL_DEFINE1(switch_cred, int, fd)
{
struct fd f = fdget(fd);
if (!f.file)
return -EBADF;
put_cred(override_creds(f.file->f_cred);
fdput(f);
return 0;
}

and that's all there is to it.

2013-10-17 01:18:39

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops

Al Viro <[email protected]> writes:

> On Wed, Oct 16, 2013 at 03:01:57PM -0700, Jim Lieb wrote:
>> File servers must do some operations with the credentials of
>> their client. This syscall switches the key credentials similar
>> to nfsd_setuser() in fs/nfsd/auth.c with the capability of retaining a
>> handle to the credentials by way of an fd for an open anonymous file.
>> This makes switching for subsequent operations for that client more efficient.
>
> Yet Another Untyped Multiplexor. Inna bun. Onna stick.
> CMOT Dibbler special...
>
> Switching creds to those of opener of given file descriptor
> is fine, but in any realistic situation you'll get all the real win
> from that - you should cache those fds (which you seem to do), and
> then setuid/etc. is done once per cache miss. Making the magical
> "set them all at once" mess (complete with non-trivial structure,
> 32/64bit compat, etc.) pointless. Moreover, you don't need any magic
> files at all - just set the creds and open /dev/null and there's your fd.
> With proper creds associated with it. While we are at it, just _start_
> with opening /dev/null. With your initial creds. Voila - revert is
> simply switch to that fd's creds.
>
> IOW, you really need only one syscall:

That doesn't look bad but it does need capable(CAP_SETUID) &&
capable(CAP_SETGID) or possibly something a little more refined.

I don't think we want file descriptor passing to all of a sudden become
a grant of privilege, beyond what the passed fd can do.

> SYSCALL_DEFINE1(switch_cred, int, fd)
> {
> struct fd f = fdget(fd);
> if (!f.file)
> return -EBADF;
> put_cred(override_creds(f.file->f_cred);
> fdput(f);
> return 0;
> }
>
> and that's all there is to it.

Eric

2013-10-17 01:20:58

by Al Viro

[permalink] [raw]

Subject: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops

On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:

> That doesn't look bad but it does need capable(CAP_SETUID) &&
> capable(CAP_SETGID) or possibly something a little more refined.

D'oh

> I don't think we want file descriptor passing to all of a sudden become
> a grant of privilege, beyond what the passed fd can do.

Definitely. And an extra ) to make it compile wouldn't hurt either...

2013-10-17 03:36:15

by Jim Lieb

[permalink] [raw]

Subject: Re: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops

On Thursday, October 17, 2013 02:20:50 Al Viro wrote:
> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
> > That doesn't look bad but it does need capable(CAP_SETUID) &&
> > capable(CAP_SETGID) or possibly something a little more refined.
>
> D'oh
>
> > I don't think we want file descriptor passing to all of a sudden become
> > a grant of privilege, beyond what the passed fd can do.
>
> Definitely. And an extra ) to make it compile wouldn't hurt either...

Ok, I'll rework this, dropping the void arg etc. How about this:

1. have one arg, the fd, i.e. SYSCALL_DEFINE1(switch_cred, int, fd)

2. if the fd >=0 do the override in my "use the fd" variation. This would do
the capability check after the valid fd check. This means that you must have
privs to mess with privs. Returns 0 or either EBADF or EPERM

3. if the fd == -1 do the revert case. The reason for this is there are 4
syscalls needed to change the creds and each has an error return. We need
a way to escape the damage and a revert to the real creds set is the best way
to return to a known state. This does not require a capability check because
all that can happen is to return to the immutable real set. Also, I don't
need the initial open of /dev/null.

Does this fit?

Jim
--
Jim Lieb
Linux Systems Engineer
Panasas Inc.

"If ease of use was the only requirement, we would all be riding tricycles"
- Douglas Engelbart 1925–2013

2013-10-17 03:53:10

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops

Al Viro <[email protected]> writes:

> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
>
>> That doesn't look bad but it does need capable(CAP_SETUID) &&
>> capable(CAP_SETGID) or possibly something a little more refined.
>
> D'oh
>
>> I don't think we want file descriptor passing to all of a sudden become
>> a grant of privilege, beyond what the passed fd can do.
>
> Definitely. And an extra ) to make it compile wouldn't hurt either...

There also appears to need to be a check that we don't gain any
capabilities.

We also need a check so that you don't gain any capabilities, and
possibly a few other things.

So I suspect we want a check something like:

if ((new_cred->securebits != current_cred->securebits) ||
(new_cred->cap_inheritable != current_cred->cap_inheritable) ||
(new_cred->cap_permitted != current_cred->cap_permitted) ||
(new_cred->cap_effective != current_cred->cap_effective) ||
(new_cred->cap_bset != current_cred->cap_bset) ||
(new_cred->jit_keyring != current_cred->jit_keyring) ||
(new_cred->session_keyring != current_cred->session_keyring) ||
(new_cred->process_keyring != current_cred->process_keyring) ||
(new_cred->thread_keyring != current_cred->thread_keyring) ||
(new_cred->request_keyring != current_cred->request_keyring) ||
(new_cred->security != current_cred->security) ||
(new_cred->user_ns != current_cred->user_ns)) {
return -EPERM;
}

Eric

2013-10-24 01:14:23

by Andy Lutomirski

[permalink] [raw]

Subject: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops

On 10/16/2013 08:52 PM, Eric W. Biederman wrote:
> Al Viro <[email protected]> writes:
>
>> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
>>
>>> That doesn't look bad but it does need capable(CAP_SETUID) &&
>>> capable(CAP_SETGID) or possibly something a little more refined.
>>
>> D'oh
>>
>>> I don't think we want file descriptor passing to all of a sudden become
>>> a grant of privilege, beyond what the passed fd can do.
>>
>> Definitely. And an extra ) to make it compile wouldn't hurt either...
>
> There also appears to need to be a check that we don't gain any
> capabilities.
>
> We also need a check so that you don't gain any capabilities, and
> possibly a few other things.

Why? I like the user_ns part, but I'm not immediately seeing the issue
with capabilities.

>
> So I suspect we want a check something like:
>
> if ((new_cred->securebits != current_cred->securebits) ||
> (new_cred->cap_inheritable != current_cred->cap_inheritable) ||
> (new_cred->cap_permitted != current_cred->cap_permitted) ||
> (new_cred->cap_effective != current_cred->cap_effective) ||
> (new_cred->cap_bset != current_cred->cap_bset) ||
> (new_cred->jit_keyring != current_cred->jit_keyring) ||
> (new_cred->session_keyring != current_cred->session_keyring) ||
> (new_cred->process_keyring != current_cred->process_keyring) ||
> (new_cred->thread_keyring != current_cred->thread_keyring) ||
> (new_cred->request_keyring != current_cred->request_keyring) ||
> (new_cred->security != current_cred->security) ||
> (new_cred->user_ns != current_cred->user_ns)) {
> return -EPERM;
> }
>

I *really* don't like the idea of being able to use any old file
descriptor. I barely care what rights the caller needs to have to
invoke this -- if you're going to pass an fd that grants a capability
(in the non-Linux sense of the work), please make sure that the sender
actually wants that behavior.

IOW, have a syscall to generate a special fd for this purpose. It's
only a couple lines of code, and I think we'll really regret it if we
fsck this up.

(I will take it as a personal challenge to find at least one exploitable
privilege escalation in this if an arbitrary fd works.)

Also... real_cred looks confusing. AFAICS it is used *only* for knfsd
and faccessat. That is, current userspace can't see it. But now you'll
expose various oddities. For example, AFAICS a capability-less process
that's a userns owner can always use setuid. This will *overwrite*
real_cred. Then you're screwed, especially if this happens by accident.

That being said, Windows has had functions like this for a long time.
Processes have a primary token and possibly an impersonation token. Any
process can call ImpersonateLoggedOnUser (no privilege required) to
impersonate the credentials of a token (which is special kind of fd).
Similarly, any process can call RevertToSelf to undo it.

Is there any actual problem with allowing completely unprivileged tasks
to switch to one of these magic cred fds? That would avoid needing a
"revert" operation.

Another note: I think that there may be issues if the creator of a token
has no_new_privs set and the user doesn't. Imagine a daemon that
accepts one of these fds, impersonates it, and calls exec. This could
be used to escape from no_new_privs land.

--Andy

2013-10-24 06:00:08

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops

Andy Lutomirski <[email protected]> writes:

> On 10/16/2013 08:52 PM, Eric W. Biederman wrote:
>> Al Viro <[email protected]> writes:
>>
>>> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
>>>
>>>> That doesn't look bad but it does need capable(CAP_SETUID) &&
>>>> capable(CAP_SETGID) or possibly something a little more refined.
>>>
>>> D'oh
>>>
>>>> I don't think we want file descriptor passing to all of a sudden become
>>>> a grant of privilege, beyond what the passed fd can do.
>>>
>>> Definitely. And an extra ) to make it compile wouldn't hurt either...
>>
>> There also appears to need to be a check that we don't gain any
>> capabilities.
>>
>> We also need a check so that you don't gain any capabilities, and
>> possibly a few other things.
>
> Why? I like the user_ns part, but I'm not immediately seeing the issue
> with capabilities.

My reasoning was instead of making this syscall as generic as possible
start it out by only allowing the cases Jim cares about and working with
a model where you can't gain any permissions you couldn't gain
otherwise.

Although the fd -1 trick to revert to your other existing cred seems
reasonable.

>> So I suspect we want a check something like:
>>
>> if ((new_cred->securebits != current_cred->securebits) ||
>> (new_cred->cap_inheritable != current_cred->cap_inheritable) ||
>> (new_cred->cap_permitted != current_cred->cap_permitted) ||
>> (new_cred->cap_effective != current_cred->cap_effective) ||
>> (new_cred->cap_bset != current_cred->cap_bset) ||
>> (new_cred->jit_keyring != current_cred->jit_keyring) ||
>> (new_cred->session_keyring != current_cred->session_keyring) ||
>> (new_cred->process_keyring != current_cred->process_keyring) ||
>> (new_cred->thread_keyring != current_cred->thread_keyring) ||
>> (new_cred->request_keyring != current_cred->request_keyring) ||
>> (new_cred->security != current_cred->security) ||
>> (new_cred->user_ns != current_cred->user_ns)) {
>> return -EPERM;
>> }
>>
>
> I *really* don't like the idea of being able to use any old file
> descriptor. I barely care what rights the caller needs to have to
> invoke this -- if you're going to pass an fd that grants a capability
> (in the non-Linux sense of the work), please make sure that the sender
> actually wants that behavior.
>
> IOW, have a syscall to generate a special fd for this purpose. It's
> only a couple lines of code, and I think we'll really regret it if we
> fsck this up.
>
> (I will take it as a personal challenge to find at least one exploitable
> privilege escalation in this if an arbitrary fd works.)

If you can't switch to a uid or a gid you couldn't switch to otherwise
then the worst that can happen is an information leak. And information
leaks are rarely directly exploitable.

> Also... real_cred looks confusing. AFAICS it is used *only* for knfsd
> and faccessat. That is, current userspace can't see it. But now you'll
> expose various oddities. For example, AFAICS a capability-less process
> that's a userns owner can always use setuid. This will *overwrite*
> real_cred. Then you're screwed, especially if this happens by
> accident.

And doing in userland what faccessat, and knfsd do in the kernel is
exactly what is desired here. But maybe there are issues with that.

> That being said, Windows has had functions like this for a long time.
> Processes have a primary token and possibly an impersonation token. Any
> process can call ImpersonateLoggedOnUser (no privilege required) to
> impersonate the credentials of a token (which is special kind of fd).
> Similarly, any process can call RevertToSelf to undo it.
>
> Is there any actual problem with allowing completely unprivileged tasks
> to switch to one of these magic cred fds? That would avoid needing a
> "revert" operation.

If the permission model is this switching of credentials doesn't get you
anything you couldn't get some other way. That would seem to totally
rules out unprivileged processes switching these things.

> Another note: I think that there may be issues if the creator of a token
> has no_new_privs set and the user doesn't. Imagine a daemon that
> accepts one of these fds, impersonates it, and calls exec. This could
> be used to escape from no_new_privs land.

Which is why I was suggesting that we don't allow changing any field in
the cred except for uids and gids.

Eric

2013-10-24 19:28:41

by Andy Lutomirski

[permalink] [raw]

Subject: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops

On Wed, Oct 23, 2013 at 10:59 PM, Eric W. Biederman
<[email protected]> wrote:
> Andy Lutomirski <[email protected]> writes:
>
>> On 10/16/2013 08:52 PM, Eric W. Biederman wrote:
>>> Al Viro <[email protected]> writes:
>>>
>>>> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
>>>>
>>>>> That doesn't look bad but it does need capable(CAP_SETUID) &&
>>>>> capable(CAP_SETGID) or possibly something a little more refined.
>>>>
>>>> D'oh
>>>>
>>>>> I don't think we want file descriptor passing to all of a sudden become
>>>>> a grant of privilege, beyond what the passed fd can do.
>>>>
>>>> Definitely. And an extra ) to make it compile wouldn't hurt either...
>>>
>>> There also appears to need to be a check that we don't gain any
>>> capabilities.
>>>
>>> We also need a check so that you don't gain any capabilities, and
>>> possibly a few other things.
>>
>> Why? I like the user_ns part, but I'm not immediately seeing the issue
>> with capabilities.
>
> My reasoning was instead of making this syscall as generic as possible
> start it out by only allowing the cases Jim cares about and working with
> a model where you can't gain any permissions you couldn't gain
> otherwise.
>
> Although the fd -1 trick to revert to your other existing cred seems
> reasonable.
>
>>> So I suspect we want a check something like:
>>>
>>> if ((new_cred->securebits != current_cred->securebits) ||
>>> (new_cred->cap_inheritable != current_cred->cap_inheritable) ||
>>> (new_cred->cap_permitted != current_cred->cap_permitted) ||
>>> (new_cred->cap_effective != current_cred->cap_effective) ||
>>> (new_cred->cap_bset != current_cred->cap_bset) ||
>>> (new_cred->jit_keyring != current_cred->jit_keyring) ||
>>> (new_cred->session_keyring != current_cred->session_keyring) ||
>>> (new_cred->process_keyring != current_cred->process_keyring) ||
>>> (new_cred->thread_keyring != current_cred->thread_keyring) ||
>>> (new_cred->request_keyring != current_cred->request_keyring) ||
>>> (new_cred->security != current_cred->security) ||
>>> (new_cred->user_ns != current_cred->user_ns)) {
>>> return -EPERM;
>>> }
>>>
>>
>> I *really* don't like the idea of being able to use any old file
>> descriptor. I barely care what rights the caller needs to have to
>> invoke this -- if you're going to pass an fd that grants a capability
>> (in the non-Linux sense of the work), please make sure that the sender
>> actually wants that behavior.
>>
>> IOW, have a syscall to generate a special fd for this purpose. It's
>> only a couple lines of code, and I think we'll really regret it if we
>> fsck this up.
>>
>> (I will take it as a personal challenge to find at least one exploitable
>> privilege escalation in this if an arbitrary fd works.)
>
> If you can't switch to a uid or a gid you couldn't switch to otherwise
> then the worst that can happen is an information leak. And information
> leaks are rarely directly exploitable.

Here's the attack:

Suppose there's a daemon that uses this in conjunction with
SCM_RIGHTS. The daemon is effectively root (under the current
proposal, it has to be). So a client connects, sends a credential fd,
and the daemon impersonates those credentials.

Now a malicious user obtains *any* fd opened by root. It sends that
fd to the daemon. The daemon then impersonates root. We lose. (It
can't possibly be hard to obtain an fd with highly privileged f_cred
-- I bet that most console programs have stdin like that, for example.
There are probably many setuid programs that will happily open
/dev/null for you, too.)

>
>> Also... real_cred looks confusing. AFAICS it is used *only* for knfsd
>> and faccessat. That is, current userspace can't see it. But now you'll
>> expose various oddities. For example, AFAICS a capability-less process
>> that's a userns owner can always use setuid. This will *overwrite*
>> real_cred. Then you're screwed, especially if this happens by
>> accident.
>
> And doing in userland what faccessat, and knfsd do in the kernel is
> exactly what is desired here. But maybe there are issues with that.
>
>> That being said, Windows has had functions like this for a long time.
>> Processes have a primary token and possibly an impersonation token. Any
>> process can call ImpersonateLoggedOnUser (no privilege required) to
>> impersonate the credentials of a token (which is special kind of fd).
>> Similarly, any process can call RevertToSelf to undo it.
>>
>> Is there any actual problem with allowing completely unprivileged tasks
>> to switch to one of these magic cred fds? That would avoid needing a
>> "revert" operation.
>
> If the permission model is this switching of credentials doesn't get you
> anything you couldn't get some other way. That would seem to totally
> rules out unprivileged processes switching these things.

IMO, there are two reasonable models that involve fds carrying some
kind of credential.

1. The fd actually carries the ability to use the credentials. You
need to be very careful whom you send these to. The security
implications are obvious (which is good) and the receiver doesn't need
privilege (which is good, as long as the receiver is careful).

2. The fd is for identification only. But this means that the fd
carries the ability to identify as a user. So you *still* have to be
very careful about whom you send it to. What you need is an operation
that allows you to identify using the fd without transitively granting
the recipient the same ability. On networks, this is done by signing
some kind of challenge. The kernel could work the same way, or there
could be a new CMSG_IDENTITY that you need an identity fd to send but
that does not copy that fd to the recipient.

>
>> Another note: I think that there may be issues if the creator of a token
>> has no_new_privs set and the user doesn't. Imagine a daemon that
>> accepts one of these fds, impersonates it, and calls exec. This could
>> be used to escape from no_new_privs land.
>
> Which is why I was suggesting that we don't allow changing any field in
> the cred except for uids and gids.

If the daemon impersonates and execs, we still lose.

--Andy

2013-10-24 20:14:19

by Jim Lieb

[permalink] [raw]

Subject: Re: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops

On Wednesday, October 23, 2013 22:59:54 Eric W. Biederman wrote:
> Andy Lutomirski <[email protected]> writes:
> > On 10/16/2013 08:52 PM, Eric W. Biederman wrote:
> >> Al Viro <[email protected]> writes:
> >>> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
> >>>> That doesn't look bad but it does need capable(CAP_SETUID) &&
> >>>> capable(CAP_SETGID) or possibly something a little more refined.
> >>>
> >>> D'oh
> >>>
> >>>> I don't think we want file descriptor passing to all of a sudden become
> >>>> a grant of privilege, beyond what the passed fd can do.
> >>>
> >>> Definitely. And an extra ) to make it compile wouldn't hurt either...
> >>
> >> There also appears to need to be a check that we don't gain any
> >> capabilities.
> >>
> >> We also need a check so that you don't gain any capabilities, and
> >> possibly a few other things.

My original api does not pass any caps however, the code in setting up the new
creds does mask off a number of them in the effective set, the same set that
knfsd does. So all I can do is reduce, not extend. We need to reduce caps
because without that, quotas and access get bypassed. If I do the generic (4
syscalls), this opens up lots of options for setting things which is not my
need or intent.

> >
> > Why? I like the user_ns part, but I'm not immediately seeing the issue
> > with capabilities.

As for name spaces, we don't have much interest here for two reasons.

1. if the server does run under a different namespace, so be it. The syscall
does make those translations. If you want to firewall off the service and its
exports, fine.

2. in our main use case, nfs-ganesha is running as a pNFS metadata server and
its uid namespace comes from somewhere else (krb5 etc.) anyway. The kernel
etc. it is running on is completely separate, a black box with its own
nsswitch options, probably local only with only a small set of admin entries.

Just to be clear, if there is any mapping between uids within the ganesha env
and the local env (an nfs server that is also an app server), all of the above
applies assuming the nsswitch/idmapper/ipa is set up properly. The how that
happens is outside scope at this level.
>
> My reasoning was instead of making this syscall as generic as possible
> start it out by only allowing the cases Jim cares about and working with
> a model where you can't gain any permissions you couldn't gain
> otherwise.

Right. That is my intent, to do the same as nfsd_setuser. We need to
impersonate the user at the client per the creds we get from the appropriate
authentication service and to only do it for file access, nothing more. The
api I chose allows later extension but only supports what we need now.

I looked at the case of a setuid or seteuid with this and the use case is
NFS/CIFS/9P/... user space file servers. That is why I originally chose an
evil multiplexed call. I could split it into two or three syscalls but my
intent in either api is to have a restricted set of creds. There is a
(potential) new use case in NFSv4.2 labels where we will also have to shove
those down as well, adding another variant. We've got to pass them in somehow
and the set*uid mess is an example of the other way. The setuid family used
to be pretty simple (just two syscalls) but not anymore. At least this is an
extensible model that is consistent and returns useful things like EPERM
rather than an "alternate" uid/gid that must be interpreted to find the error.

I could also restructure the api to have a typed argument. One early pass had
a struct that had everything in it, the op, the fd, and the creds but that has
the pitfall of now freezing the whole api at the syscall entry point. If v4.2
labels or CIFS SIDs must be passed in future, there would be no alternative
but yet another syscall. I do validate the arg. The unsigned long does pass
32/64. The typedefs in the struct match what is used by the other syscalls.
Our server uses the same typedefs internally so the usual porting issues look
minimal (and the same as the setfsuid+setfsgid+setgroups it replaces).

I was also trying to cut down on the number of RCU cleanups here. This is
what nfsd_setuser already does. Caching via using an fd as a handle to the
kernel resource is good and lightweight but our server under load will have
to shed "creds" fds before I/O fds (for lock reasons) and that means the
nfsd_setuser method of setting fsuid, fsgid, altgroups, and restricted caps
could be the common path under load. RCU is cheaper but it is not cheap and
I'd really prefer to not have the fallback make matters worse.

Our use case is a meta-data server part of pNFS server, in particular, pNFS in
a clustered env. This means a higher metadata load where these switches have
real cost.
>
> Although the fd -1 trick to revert to your other existing cred seems
> reasonable.

In my revised api idea, I would still need the revert especially with multiple
set* calls that have failure paths. I may not know what was left so a full
reset to real is the safest choice in the error path. It is also cheap(er) in
the post- file operation return path.

>
> >> So I suspect we want a check something like:
> >>
> >> if ((new_cred->securebits != current_cred->securebits) ||
> >>
> >> (new_cred->cap_inheritable != current_cred->cap_inheritable) ||
> >> (new_cred->cap_permitted != current_cred->cap_permitted) ||
> >> (new_cred->cap_effective != current_cred->cap_effective) ||
> >> (new_cred->cap_bset != current_cred->cap_bset) ||
> >> (new_cred->jit_keyring != current_cred->jit_keyring) ||
> >> (new_cred->session_keyring != current_cred->session_keyring) ||
> >> (new_cred->process_keyring != current_cred->process_keyring) ||
> >> (new_cred->thread_keyring != current_cred->thread_keyring) ||
> >> (new_cred->request_keyring != current_cred->request_keyring) ||
> >> (new_cred->security != current_cred->security) ||
> >> (new_cred->user_ns != current_cred->user_ns)) {
> >>
> >> return -EPERM;
> >>
> >> }
> >
> > I *really* don't like the idea of being able to use any old file
> > descriptor. I barely care what rights the caller needs to have to
> > invoke this -- if you're going to pass an fd that grants a capability
> > (in the non-Linux sense of the work), please make sure that the sender
> > actually wants that behavior.
> >
> > IOW, have a syscall to generate a special fd for this purpose. It's
> > only a couple lines of code, and I think we'll really regret it if we
> > fsck this up.
> >
> > (I will take it as a personal challenge to find at least one exploitable
> > privilege escalation in this if an arbitrary fd works.)
>
> If you can't switch to a uid or a gid you couldn't switch to otherwise
> then the worst that can happen is an information leak. And information
> leaks are rarely directly exploitable.
>
> > Also... real_cred looks confusing. AFAICS it is used *only* for knfsd
> > and faccessat. That is, current userspace can't see it. But now you'll
> > expose various oddities. For example, AFAICS a capability-less process
> > that's a userns owner can always use setuid. This will *overwrite*
> > real_cred. Then you're screwed, especially if this happens by
> > accident.
>
> And doing in userland what faccessat, and knfsd do in the kernel is
> exactly what is desired here. But maybe there are issues with that.
>
> > That being said, Windows has had functions like this for a long time.
> > Processes have a primary token and possibly an impersonation token. Any
> > process can call ImpersonateLoggedOnUser (no privilege required) to
> > impersonate the credentials of a token (which is special kind of fd).
> > Similarly, any process can call RevertToSelf to undo it.
> >
> > Is there any actual problem with allowing completely unprivileged tasks
> > to switch to one of these magic cred fds? That would avoid needing a
> > "revert" operation.
>
> If the permission model is this switching of credentials doesn't get you
> anything you couldn't get some other way. That would seem to totally
> rules out unprivileged processes switching these things.
>
> > Another note: I think that there may be issues if the creator of a token
> > has no_new_privs set and the user doesn't. Imagine a daemon that
> > accepts one of these fds, impersonates it, and calls exec. This could
> > be used to escape from no_new_privs land.
>
> Which is why I was suggesting that we don't allow changing any field in
> the cred except for uids and gids.

Yes. Which is why in my original patch I pass a user_creds structure that
only has the fsuid, fsgid, and altgroups. This means that the caller can
*only* impersonate these particular creds for purposes of file access, not priv
escalation.

As for magic fds, I didn't have but should add to the perms check a test to
validate that the passed fd is one of our "magic" ones. The other users of
anon fds do a similar test to verify that the fd is one that they gave you.
This check would prevent the using of a random open of /dev/null with unknown
caps/creds because it would throw an error, preventing an escalation. This fd
is also opened with O_CLOEXEC so children don't get it. Our use case doesn't
need to fork/exec and if we needed to, we wouldn't be using switch_creds for
it anyway. Adding Close-on-exec removes the exploit opportunity.

Jim

>
> Eric

--
Jim Lieb
Linux Systems Engineer
Panasas Inc.

"If ease of use was the only requirement, we would all be riding tricycles"
- Douglas Engelbart 1925–2013

2013-10-24 20:24:34

by Jim Lieb

[permalink] [raw]

Subject: Re: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops

On Thursday, October 24, 2013 12:28:15 Andy Lutomirski wrote:
> On Wed, Oct 23, 2013 at 10:59 PM, Eric W. Biederman
>
> <[email protected]> wrote:
> > Andy Lutomirski <[email protected]> writes:
> >> On 10/16/2013 08:52 PM, Eric W. Biederman wrote:
> >>> Al Viro <[email protected]> writes:
> >>>> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
> >>>>> That doesn't look bad but it does need capable(CAP_SETUID) &&
> >>>>> capable(CAP_SETGID) or possibly something a little more refined.
> >>>>
> >>>> D'oh
> >>>>
> >>>>> I don't think we want file descriptor passing to all of a sudden
> >>>>> become
> >>>>> a grant of privilege, beyond what the passed fd can do.
> >>>>
> >>>> Definitely. And an extra ) to make it compile wouldn't hurt either...
> >>>
> >>> There also appears to need to be a check that we don't gain any
> >>> capabilities.
> >>>
> >>> We also need a check so that you don't gain any capabilities, and
> >>> possibly a few other things.
> >>
> >> Why? I like the user_ns part, but I'm not immediately seeing the issue
> >> with capabilities.
> >
> > My reasoning was instead of making this syscall as generic as possible
> > start it out by only allowing the cases Jim cares about and working with
> > a model where you can't gain any permissions you couldn't gain
> > otherwise.
> >
> > Although the fd -1 trick to revert to your other existing cred seems
> > reasonable.
> >
> >>> So I suspect we want a check something like:
> >>>
> >>> if ((new_cred->securebits != current_cred->securebits) ||
> >>>
> >>> (new_cred->cap_inheritable != current_cred->cap_inheritable) ||
> >>> (new_cred->cap_permitted != current_cred->cap_permitted) ||
> >>> (new_cred->cap_effective != current_cred->cap_effective) ||
> >>> (new_cred->cap_bset != current_cred->cap_bset) ||
> >>> (new_cred->jit_keyring != current_cred->jit_keyring) ||
> >>> (new_cred->session_keyring != current_cred->session_keyring) ||
> >>> (new_cred->process_keyring != current_cred->process_keyring) ||
> >>> (new_cred->thread_keyring != current_cred->thread_keyring) ||
> >>> (new_cred->request_keyring != current_cred->request_keyring) ||
> >>> (new_cred->security != current_cred->security) ||
> >>> (new_cred->user_ns != current_cred->user_ns)) {
> >>>
> >>> return -EPERM;
> >>>
> >>> }
> >>
> >> I *really* don't like the idea of being able to use any old file
> >> descriptor. I barely care what rights the caller needs to have to
> >> invoke this -- if you're going to pass an fd that grants a capability
> >> (in the non-Linux sense of the work), please make sure that the sender
> >> actually wants that behavior.
> >>
> >> IOW, have a syscall to generate a special fd for this purpose. It's
> >> only a couple lines of code, and I think we'll really regret it if we
> >> fsck this up.
> >>
> >> (I will take it as a personal challenge to find at least one exploitable
> >> privilege escalation in this if an arbitrary fd works.)
> >
> > If you can't switch to a uid or a gid you couldn't switch to otherwise
> > then the worst that can happen is an information leak. And information
> > leaks are rarely directly exploitable.
>
> Here's the attack:
>
> Suppose there's a daemon that uses this in conjunction with
> SCM_RIGHTS. The daemon is effectively root (under the current
> proposal, it has to be). So a client connects, sends a credential fd,
> and the daemon impersonates those credentials.
>
> Now a malicious user obtains *any* fd opened by root. It sends that
> fd to the daemon. The daemon then impersonates root. We lose. (It
> can't possibly be hard to obtain an fd with highly privileged f_cred
> -- I bet that most console programs have stdin like that, for example.
> There are probably many setuid programs that will happily open
> /dev/null for you, too.)

In my reply to Eric, I note that I need to add a check that the fd passed is
one from switch_creds. With that test, not any fd will do and the one that
does has only been able to set fsuid, fsgid, altgroups, and reduced (the nfsd
set) caps. They can do no more damage than what the original switch_creds
allowed. The any fd by root no longer applies so use doesn't get much (no
escalation).

>
> >> Also... real_cred looks confusing. AFAICS it is used *only* for knfsd
> >> and faccessat. That is, current userspace can't see it. But now you'll
> >> expose various oddities. For example, AFAICS a capability-less process
> >> that's a userns owner can always use setuid. This will *overwrite*
> >> real_cred. Then you're screwed, especially if this happens by
> >> accident.
> >
> > And doing in userland what faccessat, and knfsd do in the kernel is
> > exactly what is desired here. But maybe there are issues with that.
> >
> >> That being said, Windows has had functions like this for a long time.
> >> Processes have a primary token and possibly an impersonation token. Any
> >> process can call ImpersonateLoggedOnUser (no privilege required) to
> >> impersonate the credentials of a token (which is special kind of fd).
> >> Similarly, any process can call RevertToSelf to undo it.
> >>
> >> Is there any actual problem with allowing completely unprivileged tasks
> >> to switch to one of these magic cred fds? That would avoid needing a
> >> "revert" operation.
> >
> > If the permission model is this switching of credentials doesn't get you
> > anything you couldn't get some other way. That would seem to totally
> > rules out unprivileged processes switching these things.
>
> IMO, there are two reasonable models that involve fds carrying some
> kind of credential.
>
> 1. The fd actually carries the ability to use the credentials. You
> need to be very careful whom you send these to. The security
> implications are obvious (which is good) and the receiver doesn't need
> privilege (which is good, as long as the receiver is careful).

I test for caps. I think using switch_creds in all forms should require
privs. I thought of the revert case and although it does simply return to
"real" creds, you still need CAP_SETUID and CAP_SETGID to do it. This means,
of course, if you have really messed up things, you may not be able to get
back home which, although a bad thing for the process, is a good thing for the
system as a whole.

>
> 2. The fd is for identification only. But this means that the fd
> carries the ability to identify as a user. So you *still* have to be
> very careful about whom you send it to. What you need is an operation
> that allows you to identify using the fd without transitively granting
> the recipient the same ability. On networks, this is done by signing
> some kind of challenge. The kernel could work the same way, or there
> could be a new CMSG_IDENTITY that you need an identity fd to send but
> that does not copy that fd to the recipient.

I am not sure I understand this. CMSG only applies to UNIX_DOMAIN sockets
which means that the switch_creds fd test still applies here. It is
identification but only for within the same kernel. As for namespaces, the
translation was done when the creds fd was created. I suppose if it was
passed across namespace boundaries there could be a problem but what is in the
creds structure is the translated fduid,fsgid etc., not the untranslated one.
We are still doing access checks and quota stuff with the translated creds. If
one namespace has "fred" as uid 52 and another has "mary" as 52, the quota
will only be credited to whoever "fred" really is only if "fred" gets to be
"mary" in the second alternate universe...
>
> >> Another note: I think that there may be issues if the creator of a token
> >> has no_new_privs set and the user doesn't. Imagine a daemon that
> >> accepts one of these fds, impersonates it, and calls exec. This could
> >> be used to escape from no_new_privs land.
> >
> > Which is why I was suggesting that we don't allow changing any field in
> > the cred except for uids and gids.
>
> If the daemon impersonates and execs, we still lose.

I answered this. You only get to impersonate for purposes of file access with
reduced caps to prevent you from being root as well. Also, since these are
O_CLOEXEC and switch_creds "magic" fds, this can't happen because the fd is
gone post-exec.

>
>
> --Andy

--
Jim Lieb
Linux Systems Engineer
Panasas Inc.

"If ease of use was the only requirement, we would all be riding tricycles"
- Douglas Engelbart 1925–2013

2013-10-31 19:09:32

by Andy Lutomirski

[permalink] [raw]

Subject: Re: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops

On Thu, Oct 24, 2013 at 1:24 PM, Jim Lieb <[email protected]> wrote:
> On Thursday, October 24, 2013 12:28:15 Andy Lutomirski wrote:
>> On Wed, Oct 23, 2013 at 10:59 PM, Eric W. Biederman
>>
>> <[email protected]> wrote:
>> > Andy Lutomirski <[email protected]> writes:
>> >> On 10/16/2013 08:52 PM, Eric W. Biederman wrote:
>> >>> Al Viro <[email protected]> writes:
>> >>>> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
>> >>>>> That doesn't look bad but it does need capable(CAP_SETUID) &&
>> >>>>> capable(CAP_SETGID) or possibly something a little more refined.
>> >>>>
>> >>>> D'oh
>> >>>>
>> >>>>> I don't think we want file descriptor passing to all of a sudden
>> >>>>> become
>> >>>>> a grant of privilege, beyond what the passed fd can do.
>> >>>>
>> >>>> Definitely. And an extra ) to make it compile wouldn't hurt either...
>> >>>
>> >>> There also appears to need to be a check that we don't gain any
>> >>> capabilities.
>> >>>
>> >>> We also need a check so that you don't gain any capabilities, and
>> >>> possibly a few other things.
>> >>
>> >> Why? I like the user_ns part, but I'm not immediately seeing the issue
>> >> with capabilities.
>> >
>> > My reasoning was instead of making this syscall as generic as possible
>> > start it out by only allowing the cases Jim cares about and working with
>> > a model where you can't gain any permissions you couldn't gain
>> > otherwise.
>> >
>> > Although the fd -1 trick to revert to your other existing cred seems
>> > reasonable.
>> >
>> >>> So I suspect we want a check something like:
>> >>>
>> >>> if ((new_cred->securebits != current_cred->securebits) ||
>> >>>
>> >>> (new_cred->cap_inheritable != current_cred->cap_inheritable) ||
>> >>> (new_cred->cap_permitted != current_cred->cap_permitted) ||
>> >>> (new_cred->cap_effective != current_cred->cap_effective) ||
>> >>> (new_cred->cap_bset != current_cred->cap_bset) ||
>> >>> (new_cred->jit_keyring != current_cred->jit_keyring) ||
>> >>> (new_cred->session_keyring != current_cred->session_keyring) ||
>> >>> (new_cred->process_keyring != current_cred->process_keyring) ||
>> >>> (new_cred->thread_keyring != current_cred->thread_keyring) ||
>> >>> (new_cred->request_keyring != current_cred->request_keyring) ||
>> >>> (new_cred->security != current_cred->security) ||
>> >>> (new_cred->user_ns != current_cred->user_ns)) {
>> >>>
>> >>> return -EPERM;
>> >>>
>> >>> }
>> >>
>> >> I *really* don't like the idea of being able to use any old file
>> >> descriptor. I barely care what rights the caller needs to have to
>> >> invoke this -- if you're going to pass an fd that grants a capability
>> >> (in the non-Linux sense of the work), please make sure that the sender
>> >> actually wants that behavior.
>> >>
>> >> IOW, have a syscall to generate a special fd for this purpose. It's
>> >> only a couple lines of code, and I think we'll really regret it if we
>> >> fsck this up.
>> >>
>> >> (I will take it as a personal challenge to find at least one exploitable
>> >> privilege escalation in this if an arbitrary fd works.)
>> >
>> > If you can't switch to a uid or a gid you couldn't switch to otherwise
>> > then the worst that can happen is an information leak. And information
>> > leaks are rarely directly exploitable.
>>
>> Here's the attack:
>>
>> Suppose there's a daemon that uses this in conjunction with
>> SCM_RIGHTS. The daemon is effectively root (under the current
>> proposal, it has to be). So a client connects, sends a credential fd,
>> and the daemon impersonates those credentials.
>>
>> Now a malicious user obtains *any* fd opened by root. It sends that
>> fd to the daemon. The daemon then impersonates root. We lose. (It
>> can't possibly be hard to obtain an fd with highly privileged f_cred
>> -- I bet that most console programs have stdin like that, for example.
>> There are probably many setuid programs that will happily open
>> /dev/null for you, too.)
>
> In my reply to Eric, I note that I need to add a check that the fd passed is
> one from switch_creds. With that test, not any fd will do and the one that
> does has only been able to set fsuid, fsgid, altgroups, and reduced (the nfsd
> set) caps. They can do no more damage than what the original switch_creds
> allowed. The any fd by root no longer applies so use doesn't get much (no
> escalation).
>
>>
>> >> Also... real_cred looks confusing. AFAICS it is used *only* for knfsd
>> >> and faccessat. That is, current userspace can't see it. But now you'll
>> >> expose various oddities. For example, AFAICS a capability-less process
>> >> that's a userns owner can always use setuid. This will *overwrite*
>> >> real_cred. Then you're screwed, especially if this happens by
>> >> accident.
>> >
>> > And doing in userland what faccessat, and knfsd do in the kernel is
>> > exactly what is desired here. But maybe there are issues with that.
>> >
>> >> That being said, Windows has had functions like this for a long time.
>> >> Processes have a primary token and possibly an impersonation token. Any
>> >> process can call ImpersonateLoggedOnUser (no privilege required) to
>> >> impersonate the credentials of a token (which is special kind of fd).
>> >> Similarly, any process can call RevertToSelf to undo it.
>> >>
>> >> Is there any actual problem with allowing completely unprivileged tasks
>> >> to switch to one of these magic cred fds? That would avoid needing a
>> >> "revert" operation.
>> >
>> > If the permission model is this switching of credentials doesn't get you
>> > anything you couldn't get some other way. That would seem to totally
>> > rules out unprivileged processes switching these things.
>>
>> IMO, there are two reasonable models that involve fds carrying some
>> kind of credential.
>>
>> 1. The fd actually carries the ability to use the credentials. You
>> need to be very careful whom you send these to. The security
>> implications are obvious (which is good) and the receiver doesn't need
>> privilege (which is good, as long as the receiver is careful).
>
> I test for caps. I think using switch_creds in all forms should require
> privs. I thought of the revert case and although it does simply return to
> "real" creds, you still need CAP_SETUID and CAP_SETGID to do it. This means,
> of course, if you have really messed up things, you may not be able to get
> back home which, although a bad thing for the process, is a good thing for the
> system as a whole.
>
>>
>> 2. The fd is for identification only. But this means that the fd
>> carries the ability to identify as a user. So you *still* have to be
>> very careful about whom you send it to. What you need is an operation
>> that allows you to identify using the fd without transitively granting
>> the recipient the same ability. On networks, this is done by signing
>> some kind of challenge. The kernel could work the same way, or there
>> could be a new CMSG_IDENTITY that you need an identity fd to send but
>> that does not copy that fd to the recipient.
>
> I am not sure I understand this. CMSG only applies to UNIX_DOMAIN sockets
> which means that the switch_creds fd test still applies here. It is
> identification but only for within the same kernel. As for namespaces, the
> translation was done when the creds fd was created. I suppose if it was
> passed across namespace boundaries there could be a problem but what is in the
> creds structure is the translated fduid,fsgid etc., not the untranslated one.
> We are still doing access checks and quota stuff with the translated creds. If
> one namespace has "fred" as uid 52 and another has "mary" as 52, the quota
> will only be credited to whoever "fred" really is only if "fred" gets to be
> "mary" in the second alternate universe...

I'm not talking about namespaces here -- I think we're not really on
the same page. Can you describe what you're envisioning doing with
these fds?

>>
>> >> Another note: I think that there may be issues if the creator of a token
>> >> has no_new_privs set and the user doesn't. Imagine a daemon that
>> >> accepts one of these fds, impersonates it, and calls exec. This could
>> >> be used to escape from no_new_privs land.
>> >
>> > Which is why I was suggesting that we don't allow changing any field in
>> > the cred except for uids and gids.
>>
>> If the daemon impersonates and execs, we still lose.
>
> I answered this. You only get to impersonate for purposes of file access with
> reduced caps to prevent you from being root as well. Also, since these are
> O_CLOEXEC and switch_creds "magic" fds, this can't happen because the fd is
> gone post-exec.

If a no_new_privs process authenticates to a daemon and that daemon
execs, then there's now a dumpable, ptraceable, non-no-new-privs
process running as the uid/gid of the no_new_privs process. This may
be dangerous.

>
>>
>>
>> --Andy
>
> --
> Jim Lieb
> Linux Systems Engineer
> Panasas Inc.
>
> "If ease of use was the only requirement, we would all be riding tricycles"
> - Douglas Engelbart 1925?2013

--
Andy Lutomirski
AMA Capital Management, LLC

2013-10-31 19:49:17

by Andy Lutomirski

[permalink] [raw]

Subject: Re: Re: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops

On Thu, Oct 31, 2013 at 12:43 PM, Jim Lieb <[email protected]> wrote:
> On Thursday, October 31, 2013 12:09:08 Andy Lutomirski wrote:
>> On Thu, Oct 24, 2013 at 1:24 PM, Jim Lieb <[email protected]> wrote:
>> > On Thursday, October 24, 2013 12:28:15 Andy Lutomirski wrote:
>> >> On Wed, Oct 23, 2013 at 10:59 PM, Eric W. Biederman
>> >>
>> >> <[email protected]> wrote:
>> >> > Andy Lutomirski <[email protected]> writes:
>> >> >> On 10/16/2013 08:52 PM, Eric W. Biederman wrote:
>> >> >>> Al Viro <[email protected]> writes:
>> >> >>>> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
>> >> >>>>> That doesn't look bad but it does need capable(CAP_SETUID) &&
>> >> >>>>> capable(CAP_SETGID) or possibly something a little more refined.
>> >> >>>>
>> >> >>>> D'oh
>> >> >>>>
>> >> >>>>> I don't think we want file descriptor passing to all of a sudden
>> >> >>>>> become
>> >> >>>>> a grant of privilege, beyond what the passed fd can do.
>> >> >>>>
>> >> >>>> Definitely. And an extra ) to make it compile wouldn't hurt
>> >> >>>> either...
>> >> >>>
>> >> >>> There also appears to need to be a check that we don't gain any
>> >> >>> capabilities.
>> >> >>>
>> >> >>> We also need a check so that you don't gain any capabilities, and
>> >> >>> possibly a few other things.
>> >> >>
>> >> >> Why? I like the user_ns part, but I'm not immediately seeing the
>> >> >> issue
>> >> >> with capabilities.
>> >> >
>> >> > My reasoning was instead of making this syscall as generic as possible
>> >> > start it out by only allowing the cases Jim cares about and working
>> >> > with
>> >> > a model where you can't gain any permissions you couldn't gain
>> >> > otherwise.
>> >> >
>> >> > Although the fd -1 trick to revert to your other existing cred seems
>> >> > reasonable.
>> >> >
>> >> >>> So I suspect we want a check something like:
>> >> >>>
>> >> >>> if ((new_cred->securebits != current_cred->securebits) ||
>> >> >>>
>> >> >>> (new_cred->cap_inheritable != current_cred->cap_inheritable) ||
>> >> >>> (new_cred->cap_permitted != current_cred->cap_permitted) ||
>> >> >>> (new_cred->cap_effective != current_cred->cap_effective) ||
>> >> >>> (new_cred->cap_bset != current_cred->cap_bset) ||
>> >> >>> (new_cred->jit_keyring != current_cred->jit_keyring) ||
>> >> >>> (new_cred->session_keyring != current_cred->session_keyring) ||
>> >> >>> (new_cred->process_keyring != current_cred->process_keyring) ||
>> >> >>> (new_cred->thread_keyring != current_cred->thread_keyring) ||
>> >> >>> (new_cred->request_keyring != current_cred->request_keyring) ||
>> >> >>> (new_cred->security != current_cred->security) ||
>> >> >>> (new_cred->user_ns != current_cred->user_ns)) {
>> >> >>>
>> >> >>> return -EPERM;
>> >> >>>
>> >> >>> }
>> >> >>
>> >> >> I *really* don't like the idea of being able to use any old file
>> >> >> descriptor. I barely care what rights the caller needs to have to
>> >> >> invoke this -- if you're going to pass an fd that grants a capability
>> >> >> (in the non-Linux sense of the work), please make sure that the sender
>> >> >> actually wants that behavior.
>> >> >>
>> >> >> IOW, have a syscall to generate a special fd for this purpose. It's
>> >> >> only a couple lines of code, and I think we'll really regret it if we
>> >> >> fsck this up.
>> >> >>
>> >> >> (I will take it as a personal challenge to find at least one
>> >> >> exploitable
>> >> >> privilege escalation in this if an arbitrary fd works.)
>> >> >
>> >> > If you can't switch to a uid or a gid you couldn't switch to otherwise
>> >> > then the worst that can happen is an information leak. And information
>> >> > leaks are rarely directly exploitable.
>> >>
>> >> Here's the attack:
>> >>
>> >> Suppose there's a daemon that uses this in conjunction with
>> >> SCM_RIGHTS. The daemon is effectively root (under the current
>> >> proposal, it has to be). So a client connects, sends a credential fd,
>> >> and the daemon impersonates those credentials.
>> >>
>> >> Now a malicious user obtains *any* fd opened by root. It sends that
>> >> fd to the daemon. The daemon then impersonates root. We lose. (It
>> >> can't possibly be hard to obtain an fd with highly privileged f_cred
>> >> -- I bet that most console programs have stdin like that, for example.
>> >>
>> >> There are probably many setuid programs that will happily open
>> >>
>> >> /dev/null for you, too.)
>> >
>> > In my reply to Eric, I note that I need to add a check that the fd passed
>> > is one from switch_creds. With that test, not any fd will do and the one
>> > that does has only been able to set fsuid, fsgid, altgroups, and reduced
>> > (the nfsd set) caps. They can do no more damage than what the original
>> > switch_creds allowed. The any fd by root no longer applies so use
>> > doesn't get much (no escalation).
>> >
>> >> >> Also... real_cred looks confusing. AFAICS it is used *only* for knfsd
>> >> >> and faccessat. That is, current userspace can't see it. But now
>> >> >> you'll
>> >> >> expose various oddities. For example, AFAICS a capability-less
>> >> >> process
>> >> >> that's a userns owner can always use setuid. This will *overwrite*
>> >> >> real_cred. Then you're screwed, especially if this happens by
>> >> >> accident.
>> >> >
>> >> > And doing in userland what faccessat, and knfsd do in the kernel is
>> >> > exactly what is desired here. But maybe there are issues with that.
>> >> >
>> >> >> That being said, Windows has had functions like this for a long time.
>> >> >> Processes have a primary token and possibly an impersonation token.
>> >> >> Any
>> >> >> process can call ImpersonateLoggedOnUser (no privilege required) to
>> >> >> impersonate the credentials of a token (which is special kind of fd).
>> >> >> Similarly, any process can call RevertToSelf to undo it.
>> >> >>
>> >> >> Is there any actual problem with allowing completely unprivileged
>> >> >> tasks
>> >> >> to switch to one of these magic cred fds? That would avoid needing a
>> >> >> "revert" operation.
>> >> >
>> >> > If the permission model is this switching of credentials doesn't get
>> >> > you
>> >> > anything you couldn't get some other way. That would seem to totally
>> >> > rules out unprivileged processes switching these things.
>> >>
>> >> IMO, there are two reasonable models that involve fds carrying some
>> >> kind of credential.
>> >>
>> >> 1. The fd actually carries the ability to use the credentials. You
>> >> need to be very careful whom you send these to. The security
>> >> implications are obvious (which is good) and the receiver doesn't need
>> >> privilege (which is good, as long as the receiver is careful).
>> >
>> > I test for caps. I think using switch_creds in all forms should require
>> > privs. I thought of the revert case and although it does simply return to
>> > "real" creds, you still need CAP_SETUID and CAP_SETGID to do it. This
>> > means, of course, if you have really messed up things, you may not be
>> > able to get back home which, although a bad thing for the process, is a
>> > good thing for the system as a whole.
>> >
>> >> 2. The fd is for identification only. But this means that the fd
>> >> carries the ability to identify as a user. So you *still* have to be
>> >> very careful about whom you send it to. What you need is an operation
>> >> that allows you to identify using the fd without transitively granting
>> >> the recipient the same ability. On networks, this is done by signing
>> >> some kind of challenge. The kernel could work the same way, or there
>> >> could be a new CMSG_IDENTITY that you need an identity fd to send but
>> >> that does not copy that fd to the recipient.
>> >
>> > I am not sure I understand this. CMSG only applies to UNIX_DOMAIN sockets
>> > which means that the switch_creds fd test still applies here. It is
>> > identification but only for within the same kernel. As for namespaces,
>> > the
>> > translation was done when the creds fd was created. I suppose if it was
>> > passed across namespace boundaries there could be a problem but what is in
>> > the creds structure is the translated fduid,fsgid etc., not the
>> > untranslated one. We are still doing access checks and quota stuff with
>> > the translated creds. If one namespace has "fred" as uid 52 and another
>> > has "mary" as 52, the quota will only be credited to whoever "fred"
>> > really is only if "fred" gets to be "mary" in the second alternate
>> > universe...
>>
>> I'm not talking about namespaces here -- I think we're not really on
>> the same page. Can you describe what you're envisioning doing with
>> these fds?
>
> I have a new version that is now two syscalls and no multiplexing has more
> testing etc. to mirror exactly the tests in setfsuid(). I still am testing
> but plan to send this new patchset next week for review.
>
> Ok, I may have missed something. What I meant to say is that I'm using the
> same namespace functions for switch_creds as all the set*uid syscalls use so
> what I have should work as well. If one were to pass this fd via CMSG, it
> could cross a namespace boundary. The changed mappings of this fd would be
> the same as for any other fd passed across now. Maybe that is irrelevant in
> this case.
>
> The purpose of the fd is to create a handle to a creds set that a server can
> then efficiently switch to prior to filesystem syscalls where fsuid etc. are
> relevant (mkdir, creat, write, etc.). This is exposing the override_creds()
> and revert_creds() to these servers. I do not intend in our server to hand
> these fds to anyone else. After all, they are useless for anything other than
> switch_creds/use_creds.
>
> The server keeps a cache of its known, currently active client creds along
> with this fd. The fast path is to lookup the creds and use the fd. On cache
> misses, it does a switch_creds() and saves the fd in the cache.

So if I understand you correctly, you're not planning on sending this
fd between process at all. In that case, adding a new API that seems
designed for interprocess use and having exactly zero usecases to
think about makes me nervous. Why is it going to use fds again?

Or maybe I should wait to see the updated API before complaining :)

>
>>
>> >> >> Another note: I think that there may be issues if the creator of a
>> >> >> token
>> >> >> has no_new_privs set and the user doesn't. Imagine a daemon that
>> >> >> accepts one of these fds, impersonates it, and calls exec. This could
>> >> >> be used to escape from no_new_privs land.
>> >> >
>> >> > Which is why I was suggesting that we don't allow changing any field in
>> >> > the cred except for uids and gids.
>> >>
>> >> If the daemon impersonates and execs, we still lose.
>> >
>> > I answered this. You only get to impersonate for purposes of file access
>> > with reduced caps to prevent you from being root as well. Also, since
>> > these are O_CLOEXEC and switch_creds "magic" fds, this can't happen
>> > because the fd is gone post-exec.
>>
>> If a no_new_privs process authenticates to a daemon and that daemon
>> execs, then there's now a dumpable, ptraceable, non-no-new-privs
>> process running as the uid/gid of the no_new_privs process. This may
>> be dangerous.
>
> Two things. My new patch set now throws an error if the fd is not one
> returned by switch_creds(). The new syscall here is use_creds() btw. That fd
> is opened O_CLOEXEC so the syscall has two guards. First, it can only be one
> that switch_creds() returned and second, it gets closed on an exec so the
> no_new_privs process doesn't get it. In addition, switch_creds() reduces the
> effective caps set to the same minimal set that nfsd_setuser() has.
>
> If someone else chooses to pass this fd via CMSG, whoever gets it has reduced
> privs and in order to use it for anything, i.e. use_creds(), it still needs to
> inherit SETUID and SETGID caps from its parent or it will get an EPERM error.
> Passing this in "friendly" code defeats the purpose of the cache above in that
> we could get fd leaks. If there is another flag that could prevent its being
> passed along via CMSG, I can add that too because using CMSG is well outside
> the use case.

I still don't see the point of lowering effective caps, but this is
IMO minor. Are you checking the permitted set on revert?

--Andy

2013-10-31 19:56:00

by Jim Lieb

[permalink] [raw]

Subject: Re: Re: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops

On Thursday, October 31, 2013 12:09:08 Andy Lutomirski wrote:
> On Thu, Oct 24, 2013 at 1:24 PM, Jim Lieb <[email protected]> wrote:
> > On Thursday, October 24, 2013 12:28:15 Andy Lutomirski wrote:
> >> On Wed, Oct 23, 2013 at 10:59 PM, Eric W. Biederman
> >>
> >> <[email protected]> wrote:
> >> > Andy Lutomirski <[email protected]> writes:
> >> >> On 10/16/2013 08:52 PM, Eric W. Biederman wrote:
> >> >>> Al Viro <[email protected]> writes:
> >> >>>> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
> >> >>>>> That doesn't look bad but it does need capable(CAP_SETUID) &&
> >> >>>>> capable(CAP_SETGID) or possibly something a little more refined.
> >> >>>>
> >> >>>> D'oh
> >> >>>>
> >> >>>>> I don't think we want file descriptor passing to all of a sudden
> >> >>>>> become
> >> >>>>> a grant of privilege, beyond what the passed fd can do.
> >> >>>>
> >> >>>> Definitely. And an extra ) to make it compile wouldn't hurt
> >> >>>> either...
> >> >>>
> >> >>> There also appears to need to be a check that we don't gain any
> >> >>> capabilities.
> >> >>>
> >> >>> We also need a check so that you don't gain any capabilities, and
> >> >>> possibly a few other things.
> >> >>
> >> >> Why? I like the user_ns part, but I'm not immediately seeing the
> >> >> issue
> >> >> with capabilities.
> >> >
> >> > My reasoning was instead of making this syscall as generic as possible
> >> > start it out by only allowing the cases Jim cares about and working
> >> > with
> >> > a model where you can't gain any permissions you couldn't gain
> >> > otherwise.
> >> >
> >> > Although the fd -1 trick to revert to your other existing cred seems
> >> > reasonable.
> >> >
> >> >>> So I suspect we want a check something like:
> >> >>>
> >> >>> if ((new_cred->securebits != current_cred->securebits) ||
> >> >>>
> >> >>> (new_cred->cap_inheritable != current_cred->cap_inheritable) ||
> >> >>> (new_cred->cap_permitted != current_cred->cap_permitted) ||
> >> >>> (new_cred->cap_effective != current_cred->cap_effective) ||
> >> >>> (new_cred->cap_bset != current_cred->cap_bset) ||
> >> >>> (new_cred->jit_keyring != current_cred->jit_keyring) ||
> >> >>> (new_cred->session_keyring != current_cred->session_keyring) ||
> >> >>> (new_cred->process_keyring != current_cred->process_keyring) ||
> >> >>> (new_cred->thread_keyring != current_cred->thread_keyring) ||
> >> >>> (new_cred->request_keyring != current_cred->request_keyring) ||
> >> >>> (new_cred->security != current_cred->security) ||
> >> >>> (new_cred->user_ns != current_cred->user_ns)) {
> >> >>>
> >> >>> return -EPERM;
> >> >>>
> >> >>> }
> >> >>
> >> >> I *really* don't like the idea of being able to use any old file
> >> >> descriptor. I barely care what rights the caller needs to have to
> >> >> invoke this -- if you're going to pass an fd that grants a capability
> >> >> (in the non-Linux sense of the work), please make sure that the sender
> >> >> actually wants that behavior.
> >> >>
> >> >> IOW, have a syscall to generate a special fd for this purpose. It's
> >> >> only a couple lines of code, and I think we'll really regret it if we
> >> >> fsck this up.
> >> >>
> >> >> (I will take it as a personal challenge to find at least one
> >> >> exploitable
> >> >> privilege escalation in this if an arbitrary fd works.)
> >> >
> >> > If you can't switch to a uid or a gid you couldn't switch to otherwise
> >> > then the worst that can happen is an information leak. And information
> >> > leaks are rarely directly exploitable.
> >>
> >> Here's the attack:
> >>
> >> Suppose there's a daemon that uses this in conjunction with
> >> SCM_RIGHTS. The daemon is effectively root (under the current
> >> proposal, it has to be). So a client connects, sends a credential fd,
> >> and the daemon impersonates those credentials.
> >>
> >> Now a malicious user obtains *any* fd opened by root. It sends that
> >> fd to the daemon. The daemon then impersonates root. We lose. (It
> >> can't possibly be hard to obtain an fd with highly privileged f_cred
> >> -- I bet that most console programs have stdin like that, for example.
> >>
> >> There are probably many setuid programs that will happily open
> >>
> >> /dev/null for you, too.)
> >
> > In my reply to Eric, I note that I need to add a check that the fd passed
> > is one from switch_creds. With that test, not any fd will do and the one
> > that does has only been able to set fsuid, fsgid, altgroups, and reduced
> > (the nfsd set) caps. They can do no more damage than what the original
> > switch_creds allowed. The any fd by root no longer applies so use
> > doesn't get much (no escalation).
> >
> >> >> Also... real_cred looks confusing. AFAICS it is used *only* for knfsd
> >> >> and faccessat. That is, current userspace can't see it. But now
> >> >> you'll
> >> >> expose various oddities. For example, AFAICS a capability-less
> >> >> process
> >> >> that's a userns owner can always use setuid. This will *overwrite*
> >> >> real_cred. Then you're screwed, especially if this happens by
> >> >> accident.
> >> >
> >> > And doing in userland what faccessat, and knfsd do in the kernel is
> >> > exactly what is desired here. But maybe there are issues with that.
> >> >
> >> >> That being said, Windows has had functions like this for a long time.
> >> >> Processes have a primary token and possibly an impersonation token.
> >> >> Any
> >> >> process can call ImpersonateLoggedOnUser (no privilege required) to
> >> >> impersonate the credentials of a token (which is special kind of fd).
> >> >> Similarly, any process can call RevertToSelf to undo it.
> >> >>
> >> >> Is there any actual problem with allowing completely unprivileged
> >> >> tasks
> >> >> to switch to one of these magic cred fds? That would avoid needing a
> >> >> "revert" operation.
> >> >
> >> > If the permission model is this switching of credentials doesn't get
> >> > you
> >> > anything you couldn't get some other way. That would seem to totally
> >> > rules out unprivileged processes switching these things.
> >>
> >> IMO, there are two reasonable models that involve fds carrying some
> >> kind of credential.
> >>
> >> 1. The fd actually carries the ability to use the credentials. You
> >> need to be very careful whom you send these to. The security
> >> implications are obvious (which is good) and the receiver doesn't need
> >> privilege (which is good, as long as the receiver is careful).
> >
> > I test for caps. I think using switch_creds in all forms should require
> > privs. I thought of the revert case and although it does simply return to
> > "real" creds, you still need CAP_SETUID and CAP_SETGID to do it. This
> > means, of course, if you have really messed up things, you may not be
> > able to get back home which, although a bad thing for the process, is a
> > good thing for the system as a whole.
> >
> >> 2. The fd is for identification only. But this means that the fd
> >> carries the ability to identify as a user. So you *still* have to be
> >> very careful about whom you send it to. What you need is an operation
> >> that allows you to identify using the fd without transitively granting
> >> the recipient the same ability. On networks, this is done by signing
> >> some kind of challenge. The kernel could work the same way, or there
> >> could be a new CMSG_IDENTITY that you need an identity fd to send but
> >> that does not copy that fd to the recipient.
> >
> > I am not sure I understand this. CMSG only applies to UNIX_DOMAIN sockets
> > which means that the switch_creds fd test still applies here. It is
> > identification but only for within the same kernel. As for namespaces,
> > the
> > translation was done when the creds fd was created. I suppose if it was
> > passed across namespace boundaries there could be a problem but what is in
> > the creds structure is the translated fduid,fsgid etc., not the
> > untranslated one. We are still doing access checks and quota stuff with
> > the translated creds. If one namespace has "fred" as uid 52 and another
> > has "mary" as 52, the quota will only be credited to whoever "fred"
> > really is only if "fred" gets to be "mary" in the second alternate
> > universe...
>
> I'm not talking about namespaces here -- I think we're not really on
> the same page. Can you describe what you're envisioning doing with
> these fds?

I have a new version that is now two syscalls and no multiplexing has more
testing etc. to mirror exactly the tests in setfsuid(). I still am testing
but plan to send this new patchset next week for review.

Ok, I may have missed something. What I meant to say is that I'm using the
same namespace functions for switch_creds as all the set*uid syscalls use so
what I have should work as well. If one were to pass this fd via CMSG, it
could cross a namespace boundary. The changed mappings of this fd would be
the same as for any other fd passed across now. Maybe that is irrelevant in
this case.

The purpose of the fd is to create a handle to a creds set that a server can
then efficiently switch to prior to filesystem syscalls where fsuid etc. are
relevant (mkdir, creat, write, etc.). This is exposing the override_creds()
and revert_creds() to these servers. I do not intend in our server to hand
these fds to anyone else. After all, they are useless for anything other than
switch_creds/use_creds.

The server keeps a cache of its known, currently active client creds along
with this fd. The fast path is to lookup the creds and use the fd. On cache
misses, it does a switch_creds() and saves the fd in the cache.

>
> >> >> Another note: I think that there may be issues if the creator of a
> >> >> token
> >> >> has no_new_privs set and the user doesn't. Imagine a daemon that
> >> >> accepts one of these fds, impersonates it, and calls exec. This could
> >> >> be used to escape from no_new_privs land.
> >> >
> >> > Which is why I was suggesting that we don't allow changing any field in
> >> > the cred except for uids and gids.
> >>
> >> If the daemon impersonates and execs, we still lose.
> >
> > I answered this. You only get to impersonate for purposes of file access
> > with reduced caps to prevent you from being root as well. Also, since
> > these are O_CLOEXEC and switch_creds "magic" fds, this can't happen
> > because the fd is gone post-exec.
>
> If a no_new_privs process authenticates to a daemon and that daemon
> execs, then there's now a dumpable, ptraceable, non-no-new-privs
> process running as the uid/gid of the no_new_privs process. This may
> be dangerous.

Two things. My new patch set now throws an error if the fd is not one
returned by switch_creds(). The new syscall here is use_creds() btw. That fd
is opened O_CLOEXEC so the syscall has two guards. First, it can only be one
that switch_creds() returned and second, it gets closed on an exec so the
no_new_privs process doesn't get it. In addition, switch_creds() reduces the
effective caps set to the same minimal set that nfsd_setuser() has.

If someone else chooses to pass this fd via CMSG, whoever gets it has reduced
privs and in order to use it for anything, i.e. use_creds(), it still needs to
inherit SETUID and SETGID caps from its parent or it will get an EPERM error.
Passing this in "friendly" code defeats the purpose of the cache above in that
we could get fd leaks. If there is another flag that could prevent its being
passed along via CMSG, I can add that too because using CMSG is well outside
the use case.

Note too, it also cannot pass it on via exec. The use case is and should be
as for knfsd, namely which calls nfsd_setuser() followed by fulfilling the
protocol op followed by a revert of creds. All this is is in the same thread.

Jim
>
> >> --Andy
> >
> > --
> > Jim Lieb
> > Linux Systems Engineer
> > Panasas Inc.
> >
> > "If ease of use was the only requirement, we would all be riding
> > tricycles"
> > - Douglas Engelbart 1925?2013

--
Jim Lieb
Linux Systems Engineer
Panasas Inc.

"If ease of use was the only requirement, we would all be riding tricycles"
- Douglas Engelbart 1925?2013

2013-10-31 20:40:15

by Jim Lieb

[permalink] [raw]

Subject: Re: Re: Re: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops

On Thursday, October 31, 2013 12:48:54 Andy Lutomirski wrote:
> On Thu, Oct 31, 2013 at 12:43 PM, Jim Lieb <[email protected]> wrote:
> > On Thursday, October 31, 2013 12:09:08 Andy Lutomirski wrote:
> >> On Thu, Oct 24, 2013 at 1:24 PM, Jim Lieb <[email protected]> wrote:
> >> > On Thursday, October 24, 2013 12:28:15 Andy Lutomirski wrote:
> >> >> On Wed, Oct 23, 2013 at 10:59 PM, Eric W. Biederman
> >> >>
> >> >> <[email protected]> wrote:
> >> >> > Andy Lutomirski <[email protected]> writes:
> >> >> >> On 10/16/2013 08:52 PM, Eric W. Biederman wrote:
> >> >> >>> Al Viro <[email protected]> writes:
> >> >> >>>> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman
wrote:
> >> >> >>>>> That doesn't look bad but it does need capable(CAP_SETUID) &&
> >> >> >>>>> capable(CAP_SETGID) or possibly something a little more refined.
> >> >> >>>>
> >> >> >>>> D'oh
> >> >> >>>>
> >> >> >>>>> I don't think we want file descriptor passing to all of a sudden
> >> >> >>>>> become
> >> >> >>>>> a grant of privilege, beyond what the passed fd can do.
> >> >> >>>>
> >> >> >>>> Definitely. And an extra ) to make it compile wouldn't hurt
> >> >> >>>> either...
> >> >> >>>
> >> >> >>> There also appears to need to be a check that we don't gain any
> >> >> >>> capabilities.
> >> >> >>>
> >> >> >>> We also need a check so that you don't gain any capabilities, and
> >> >> >>> possibly a few other things.
> >> >> >>
> >> >> >> Why? I like the user_ns part, but I'm not immediately seeing the
> >> >> >> issue
> >> >> >> with capabilities.
> >> >> >
> >> >> > My reasoning was instead of making this syscall as generic as
> >> >> > possible
> >> >> > start it out by only allowing the cases Jim cares about and working
> >> >> > with
> >> >> > a model where you can't gain any permissions you couldn't gain
> >> >> > otherwise.
> >> >> >
> >> >> > Although the fd -1 trick to revert to your other existing cred seems
> >> >> > reasonable.
> >> >> >
> >> >> >>> So I suspect we want a check something like:
> >> >> >>>
> >> >> >>> if ((new_cred->securebits != current_cred->securebits) ||
> >> >> >>>
> >> >> >>> (new_cred->cap_inheritable != current_cred->cap_inheritable)
> >> >> >>> ||
> >> >> >>> (new_cred->cap_permitted != current_cred->cap_permitted) ||
> >> >> >>> (new_cred->cap_effective != current_cred->cap_effective) ||
> >> >> >>> (new_cred->cap_bset != current_cred->cap_bset) ||
> >> >> >>> (new_cred->jit_keyring != current_cred->jit_keyring) ||
> >> >> >>> (new_cred->session_keyring != current_cred->session_keyring)
> >> >> >>> ||
> >> >> >>> (new_cred->process_keyring != current_cred->process_keyring)
> >> >> >>> ||
> >> >> >>> (new_cred->thread_keyring != current_cred->thread_keyring) ||
> >> >> >>> (new_cred->request_keyring != current_cred->request_keyring)
> >> >> >>> ||
> >> >> >>> (new_cred->security != current_cred->security) ||
> >> >> >>> (new_cred->user_ns != current_cred->user_ns)) {
> >> >> >>>
> >> >> >>> return -EPERM;
> >> >> >>>
> >> >> >>> }
> >> >> >>
> >> >> >> I *really* don't like the idea of being able to use any old file
> >> >> >> descriptor. I barely care what rights the caller needs to have to
> >> >> >> invoke this -- if you're going to pass an fd that grants a
> >> >> >> capability
> >> >> >> (in the non-Linux sense of the work), please make sure that the
> >> >> >> sender
> >> >> >> actually wants that behavior.
> >> >> >>
> >> >> >> IOW, have a syscall to generate a special fd for this purpose.
> >> >> >> It's
> >> >> >> only a couple lines of code, and I think we'll really regret it if
> >> >> >> we
> >> >> >> fsck this up.
> >> >> >>
> >> >> >> (I will take it as a personal challenge to find at least one
> >> >> >> exploitable
> >> >> >> privilege escalation in this if an arbitrary fd works.)
> >> >> >
> >> >> > If you can't switch to a uid or a gid you couldn't switch to
> >> >> > otherwise
> >> >> > then the worst that can happen is an information leak. And
> >> >> > information
> >> >> > leaks are rarely directly exploitable.
> >> >>
> >> >> Here's the attack:
> >> >>
> >> >> Suppose there's a daemon that uses this in conjunction with
> >> >> SCM_RIGHTS. The daemon is effectively root (under the current
> >> >> proposal, it has to be). So a client connects, sends a credential fd,
> >> >> and the daemon impersonates those credentials.
> >> >>
> >> >> Now a malicious user obtains *any* fd opened by root. It sends that
> >> >> fd to the daemon. The daemon then impersonates root. We lose. (It
> >> >> can't possibly be hard to obtain an fd with highly privileged f_cred
> >> >> -- I bet that most console programs have stdin like that, for example.
> >> >>
> >> >> There are probably many setuid programs that will happily open
> >> >>
> >> >> /dev/null for you, too.)
> >> >
> >> > In my reply to Eric, I note that I need to add a check that the fd
> >> > passed
> >> > is one from switch_creds. With that test, not any fd will do and the
> >> > one
> >> > that does has only been able to set fsuid, fsgid, altgroups, and
> >> > reduced
> >> > (the nfsd set) caps. They can do no more damage than what the original
> >> > switch_creds allowed. The any fd by root no longer applies so use
> >> > doesn't get much (no escalation).
> >> >
> >> >> >> Also... real_cred looks confusing. AFAICS it is used *only* for
> >> >> >> knfsd
> >> >> >> and faccessat. That is, current userspace can't see it. But now
> >> >> >> you'll
> >> >> >> expose various oddities. For example, AFAICS a capability-less
> >> >> >> process
> >> >> >> that's a userns owner can always use setuid. This will *overwrite*
> >> >> >> real_cred. Then you're screwed, especially if this happens by
> >> >> >> accident.
> >> >> >
> >> >> > And doing in userland what faccessat, and knfsd do in the kernel is
> >> >> > exactly what is desired here. But maybe there are issues with that.
> >> >> >
> >> >> >> That being said, Windows has had functions like this for a long
> >> >> >> time.
> >> >> >> Processes have a primary token and possibly an impersonation token.
> >> >> >> Any
> >> >> >> process can call ImpersonateLoggedOnUser (no privilege required) to
> >> >> >> impersonate the credentials of a token (which is special kind of
> >> >> >> fd).
> >> >> >> Similarly, any process can call RevertToSelf to undo it.
> >> >> >>
> >> >> >> Is there any actual problem with allowing completely unprivileged
> >> >> >> tasks
> >> >> >> to switch to one of these magic cred fds? That would avoid needing
> >> >> >> a
> >> >> >> "revert" operation.
> >> >> >
> >> >> > If the permission model is this switching of credentials doesn't get
> >> >> > you
> >> >> > anything you couldn't get some other way. That would seem to
> >> >> > totally
> >> >> > rules out unprivileged processes switching these things.
> >> >>
> >> >> IMO, there are two reasonable models that involve fds carrying some
> >> >> kind of credential.
> >> >>
> >> >> 1. The fd actually carries the ability to use the credentials. You
> >> >> need to be very careful whom you send these to. The security
> >> >> implications are obvious (which is good) and the receiver doesn't need
> >> >> privilege (which is good, as long as the receiver is careful).
> >> >
> >> > I test for caps. I think using switch_creds in all forms should
> >> > require
> >> > privs. I thought of the revert case and although it does simply return
> >> > to
> >> > "real" creds, you still need CAP_SETUID and CAP_SETGID to do it. This
> >> > means, of course, if you have really messed up things, you may not be
> >> > able to get back home which, although a bad thing for the process, is a
> >> > good thing for the system as a whole.
> >> >
> >> >> 2. The fd is for identification only. But this means that the fd
> >> >> carries the ability to identify as a user. So you *still* have to be
> >> >> very careful about whom you send it to. What you need is an operation
> >> >> that allows you to identify using the fd without transitively granting
> >> >> the recipient the same ability. On networks, this is done by signing
> >> >> some kind of challenge. The kernel could work the same way, or there
> >> >> could be a new CMSG_IDENTITY that you need an identity fd to send but
> >> >> that does not copy that fd to the recipient.
> >> >
> >> > I am not sure I understand this. CMSG only applies to UNIX_DOMAIN
> >> > sockets
> >> > which means that the switch_creds fd test still applies here. It is
> >> > identification but only for within the same kernel. As for namespaces,
> >> > the
> >> > translation was done when the creds fd was created. I suppose if it
> >> > was
> >> > passed across namespace boundaries there could be a problem but what is
> >> > in
> >> > the creds structure is the translated fduid,fsgid etc., not the
> >> > untranslated one. We are still doing access checks and quota stuff with
> >> > the translated creds. If one namespace has "fred" as uid 52 and
> >> > another
> >> > has "mary" as 52, the quota will only be credited to whoever "fred"
> >> > really is only if "fred" gets to be "mary" in the second alternate
> >> > universe...
> >>
> >> I'm not talking about namespaces here -- I think we're not really on
> >> the same page. Can you describe what you're envisioning doing with
> >> these fds?
> >
> > I have a new version that is now two syscalls and no multiplexing has more
> > testing etc. to mirror exactly the tests in setfsuid(). I still am
> > testing
> > but plan to send this new patchset next week for review.
> >
> > Ok, I may have missed something. What I meant to say is that I'm using
> > the same namespace functions for switch_creds as all the set*uid syscalls
> > use so what I have should work as well. If one were to pass this fd via
> > CMSG, it could cross a namespace boundary. The changed mappings of this
> > fd would be the same as for any other fd passed across now. Maybe that
> > is irrelevant in this case.
> >
> > The purpose of the fd is to create a handle to a creds set that a server
> > can then efficiently switch to prior to filesystem syscalls where fsuid
> > etc. are relevant (mkdir, creat, write, etc.). This is exposing the
> > override_creds() and revert_creds() to these servers. I do not intend in
> > our server to hand these fds to anyone else. After all, they are useless
> > for anything other than switch_creds/use_creds.
> >
> > The server keeps a cache of its known, currently active client creds along
> > with this fd. The fast path is to lookup the creds and use the fd. On
> > cache misses, it does a switch_creds() and saves the fd in the cache.
>
> So if I understand you correctly, you're not planning on sending this
> fd between process at all. In that case, adding a new API that seems
> designed for interprocess use and having exactly zero usecases to
> think about makes me nervous. Why is it going to use fds again?

Correct. They are handles to a set of creds for filesystem calls. I use fds
because every filp has a pointer to a set of creds that gets swapped in for
various things like coredumps, and in the case of knfsd, impersonating the nfs
client for when it does filesystem morphing ops. In order to currently do what
nfsd_setuser() does we must do:

setfsuid();setfsgid(); setgroups(); setcaps();
followed by
open/creat/mknod/write
followed by
setfsuid(); setfsgid(); setgroups(); setcaps();

Each one of these 4 syscalls discards an RCU copy of the creds, both going in
and going out. In the first case below, the revert doesn't recycle the RCU
because the open fd still holds a ref. Same applied to the second case. In
fact here, no new creds are created either. The creds only get recycled when
the fd is closed.

Using an fd to have a handle on what all these do is the simplest thing to
pass back and forth between user and kernel space. This is what Ted T'so and
Al Viro suggested. This ends up looking like:

fd = switch_creds(creds struct);
followed by
open/creat/mknod/write
followed by
use_creds(-1);

The -1 arg is used as I proposed in the earlier round, namely, it tells
use_creds to revert to real creds.

Subsequent uses look like:

use_creds(cached fd);
followed by
open/creat/mknod/write
followed by
use_creds(-1);

There is no need to pass the fd because if the other side has the caps to use
the fd, they have the caps to get their own. We are saving overhead.
>
> Or maybe I should wait to see the updated API before complaining :)
>
> >> >> >> Another note: I think that there may be issues if the creator of a
> >> >> >> token
> >> >> >> has no_new_privs set and the user doesn't. Imagine a daemon that
> >> >> >> accepts one of these fds, impersonates it, and calls exec. This
> >> >> >> could
> >> >> >> be used to escape from no_new_privs land.
> >> >> >
> >> >> > Which is why I was suggesting that we don't allow changing any field
> >> >> > in
> >> >> > the cred except for uids and gids.
> >> >>
> >> >> If the daemon impersonates and execs, we still lose.
> >> >
> >> > I answered this. You only get to impersonate for purposes of file
> >> > access
> >> > with reduced caps to prevent you from being root as well. Also, since
> >> > these are O_CLOEXEC and switch_creds "magic" fds, this can't happen
> >> > because the fd is gone post-exec.
> >>
> >> If a no_new_privs process authenticates to a daemon and that daemon
> >> execs, then there's now a dumpable, ptraceable, non-no-new-privs
> >> process running as the uid/gid of the no_new_privs process. This may
> >> be dangerous.
> >
> > Two things. My new patch set now throws an error if the fd is not one
> > returned by switch_creds(). The new syscall here is use_creds() btw.
> > That fd is opened O_CLOEXEC so the syscall has two guards. First, it can
> > only be one that switch_creds() returned and second, it gets closed on an
> > exec so the no_new_privs process doesn't get it. In addition,
> > switch_creds() reduces the effective caps set to the same minimal set
> > that nfsd_setuser() has.
> >
> > If someone else chooses to pass this fd via CMSG, whoever gets it has
> > reduced privs and in order to use it for anything, i.e. use_creds(), it
> > still needs to inherit SETUID and SETGID caps from its parent or it will
> > get an EPERM error. Passing this in "friendly" code defeats the purpose
> > of the cache above in that we could get fd leaks. If there is another
> > flag that could prevent its being passed along via CMSG, I can add that
> > too because using CMSG is well outside the use case.
>
> I still don't see the point of lowering effective caps, but this is
> IMO minor. Are you checking the permitted set on revert?

We lower specific filesystem related caps to prevent using the root bypass of
acess checks, prevent root bypass of quota restrictions, and to correctly
charge the impersonated user for quota'd resource usage. This is what
nfsd_setuser() does. I use the same mask set.

I don't check the permitted set associated with the fd because I've already
checked that it is a switch_creds() returned fd (they are immutable). I do
make a check at the beginning of both syscalls for the caps to set uid/gid,
the same test as setfsuid et al.

>
> --Andy

--
Jim Lieb
Linux Systems Engineer
Panasas Inc.

"If ease of use was the only requirement, we would all be riding tricycles"
- Douglas Engelbart 1925–2013