2010-04-27 20:43:44

by Serge E. Hallyn

[permalink] [raw]
Subject: [PATCH 0/3] p9auth fs: introduction

Hi,

Here is an updated version of the p9auth setuid capability
module, which hopefully addresses all previous feedback. It
is now a separate filesystem instead of a device, as per Eric's
suggestion.

During the last round, Alan Cox made a great suggestion of sending
credentials over AF_UNIX sock allowing the recipient to 'become me.'
I think that's still an interesting idea, and intend to pursue it
as Eric pushes the patches to translate userids across user
namespaces.

The tradeoffs are worth discussing. On the one
hand, p9auth requires a scary CAP_GRANT_ID capability, while a
SO_PASSAUTH would be more akin to an extension of CAP_SETUID.
Also, SO_PASSAUTH would be usable by any unprivileged app, while
one would hope there would be only one p9auth service for the whole
system. On the other hand, the p9auth API appears to be pretty
well settled and understood, and only provides for a very simple
setting of all uids and all gids to one value, plus some auxiliary
groups, which is perfect for use by simple login servers.

p9auth and SO_PASSAUTH don't appear to be mutually exclusive. I don't
know how painful it would be for plan-9 folks to make use of the
SO_PASSAUTH feature. (It should definately be possible) But in any
case here is the next iteration of p9auth fs for discussion and
consideration.

thanks,
-serge


2010-04-27 20:43:50

by Serge E. Hallyn

[permalink] [raw]
Subject: [PATCH 1/3] p9auth: split core function out of some set*{u,g}id functions

Break the core functionality of set{fs,res}{u,g}id into cred_setX
which performs the access checks based on current_cred(), but performs
the requested change on a passed-in cred.

Export the helpers, since p9auth can be compiled as a module. It
might be worth not allowing modular p9auth to avoid having to export
them.

Really the setfs{u,g}id helper isn't needed, but move it as
well to keep the code consistent.

This patch also changes set_user() to use new->user->user_ns. While
technically not needed as all callers should have new->user->user_ns
equal to current_userns(), it is more correct and may prevent surprises
in the future.

Changelog:
Apr 24: (David Howells) make cred_setresuid etc extern, and
document the helpers in Documentation/credentials.txt.

Signed-off-by: Serge E. Hallyn <[email protected]>
Cc: David Howells <[email protected]>
---
Documentation/credentials.txt | 18 ++++++
include/linux/cred.h | 12 ++++
kernel/cred.c | 119 +++++++++++++++++++++++++++++++++++++
kernel/sys.c | 131 ++++++----------------------------------
4 files changed, 169 insertions(+), 111 deletions(-)

diff --git a/Documentation/credentials.txt b/Documentation/credentials.txt
index df03169..7da876b 100644
--- a/Documentation/credentials.txt
+++ b/Documentation/credentials.txt
@@ -529,6 +529,24 @@ A typical credentials alteration function would look something like this:
return commit_creds(new);
}

+SETUID/SETGID HELPERS
+---------------------
+
+Helpers exist to perform the core of uid and gid alterations:
+
+cred_setresuid(struct cred *new, uid_t ruid, uid_t euid, uid_t suid,
+ int force);
+cred_setresgid(struct cred *new, gid_t rgid, gid_t egid, gid_t sgid,
+ int force);
+cred_setfsuid(struct cred *new, uid_t uid, uid_t *old_fsuid);
+cred_setfsgid(struct cred *new, gid_t gid, gid_t *old_fsgid);
+
+The force argument means that while the caller does not have CAP_SETUID
+or CAP_SETUID, the credentials were received from a task with CAP_GRANT_ID.
+
+These helpers are used in kernel/sys.c for the analogous syscalls.
+As can be seen in those examples, these helpers are to be wrapped
+between calls to prepare_creds() and commit_creds() or abort_creds().

MANAGING CREDENTIALS
--------------------
diff --git a/include/linux/cred.h b/include/linux/cred.h
index 52507c3..8034e22 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -22,6 +22,9 @@ struct user_struct;
struct cred;
struct inode;

+/* defined in sys.c, used in cred_setresuid */
+extern int set_user(struct cred *new);
+
/*
* COW Supplementary groups list
*/
@@ -396,4 +399,13 @@ do { \
*(_fsgid) = __cred->fsgid; \
} while(0)

+#define CRED_SETID_NOFORCE 0
+#define CRED_SETID_FORCE 1
+extern int cred_setresuid(struct cred *new, uid_t ruid, uid_t euid, uid_t suid,
+ int force);
+extern int cred_setresgid(struct cred *new, gid_t rgid, gid_t egid, gid_t sgid,
+ int force);
+extern int cred_setfsuid(struct cred *new, uid_t uid, uid_t *old_fsuid);
+extern int cred_setfsgid(struct cred *new, gid_t gid, gid_t *old_fsgid);
+
#endif /* _LINUX_CRED_H */
diff --git a/kernel/cred.c b/kernel/cred.c
index e1dbe9e..4fc3284 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -785,6 +785,125 @@ int set_create_files_as(struct cred *new, struct inode *inode)
}
EXPORT_SYMBOL(set_create_files_as);

+int cred_setresuid(struct cred *new, uid_t ruid, uid_t euid, uid_t suid,
+ int force)
+{
+ int retval;
+ const struct cred *old;
+
+ retval = security_task_setuid(ruid, euid, suid, LSM_SETID_RES);
+ if (retval)
+ return retval;
+ old = current_cred();
+
+ if (!force && !capable(CAP_SETUID)) {
+ if (ruid != (uid_t) -1 && ruid != old->uid &&
+ ruid != old->euid && ruid != old->suid)
+ return -EPERM;
+ if (euid != (uid_t) -1 && euid != old->uid &&
+ euid != old->euid && euid != old->suid)
+ return -EPERM;
+ if (suid != (uid_t) -1 && suid != old->uid &&
+ suid != old->euid && suid != old->suid)
+ return -EPERM;
+ }
+
+ if (ruid != (uid_t) -1) {
+ new->uid = ruid;
+ if (ruid != old->uid) {
+ retval = set_user(new);
+ if (retval < 0)
+ return retval;
+ }
+ }
+ if (euid != (uid_t) -1)
+ new->euid = euid;
+ if (suid != (uid_t) -1)
+ new->suid = suid;
+ new->fsuid = new->euid;
+
+ return security_task_fix_setuid(new, old, LSM_SETID_RES);
+}
+EXPORT_SYMBOL_GPL(cred_setresuid);
+
+int cred_setresgid(struct cred *new, gid_t rgid, gid_t egid, gid_t sgid,
+ int force)
+{
+ const struct cred *old = current_cred();
+ int retval;
+
+ retval = security_task_setgid(rgid, egid, sgid, LSM_SETID_RES);
+ if (retval)
+ return retval;
+
+ if (!force && !capable(CAP_SETGID)) {
+ if (rgid != (gid_t) -1 && rgid != old->gid &&
+ rgid != old->egid && rgid != old->sgid)
+ return -EPERM;
+ if (egid != (gid_t) -1 && egid != old->gid &&
+ egid != old->egid && egid != old->sgid)
+ return -EPERM;
+ if (sgid != (gid_t) -1 && sgid != old->gid &&
+ sgid != old->egid && sgid != old->sgid)
+ return -EPERM;
+ }
+
+ if (rgid != (gid_t) -1)
+ new->gid = rgid;
+ if (egid != (gid_t) -1)
+ new->egid = egid;
+ if (sgid != (gid_t) -1)
+ new->sgid = sgid;
+ new->fsgid = new->egid;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(cred_setresgid);
+
+int cred_setfsuid(struct cred *new, uid_t uid, uid_t *old_fsuid)
+{
+ const struct cred *old;
+
+ old = current_cred();
+ *old_fsuid = old->fsuid;
+
+ if (security_task_setuid(uid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS) < 0)
+ return -EPERM;
+
+ if (uid == old->uid || uid == old->euid ||
+ uid == old->suid || uid == old->fsuid ||
+ capable(CAP_SETUID)) {
+ if (uid != *old_fsuid) {
+ new->fsuid = uid;
+ if (security_task_fix_setuid(new, old, LSM_SETID_FS) == 0)
+ return 0;
+ }
+ }
+ return -EPERM;
+}
+EXPORT_SYMBOL_GPL(cred_setfsuid);
+
+int cred_setfsgid(struct cred *new, gid_t gid, gid_t *old_fsgid)
+{
+ const struct cred *old;
+
+ old = current_cred();
+ *old_fsgid = old->fsgid;
+
+ if (security_task_setgid(gid, (gid_t)-1, (gid_t)-1, LSM_SETID_FS))
+ return -EPERM;
+
+ if (gid == old->gid || gid == old->egid ||
+ gid == old->sgid || gid == old->fsgid ||
+ capable(CAP_SETGID)) {
+ if (gid != *old_fsgid) {
+ new->fsgid = gid;
+ return 0;
+ }
+ }
+ return -EPERM;
+}
+EXPORT_SYMBOL_GPL(cred_setfsgid);
+
#ifdef CONFIG_DEBUG_CREDENTIALS

bool creds_are_invalid(const struct cred *cred)
diff --git a/kernel/sys.c b/kernel/sys.c
index 6d1a7e0..78f32eb 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -565,11 +565,11 @@ error:
/*
* change the user struct in a credentials set to match the new UID
*/
-static int set_user(struct cred *new)
+int set_user(struct cred *new)
{
struct user_struct *new_user;

- new_user = alloc_uid(current_user_ns(), new->uid);
+ new_user = alloc_uid(new->user->user_ns, new->uid);
if (!new_user)
return -EAGAIN;

@@ -711,7 +711,6 @@ error:
*/
SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid)
{
- const struct cred *old;
struct cred *new;
int retval;

@@ -719,45 +718,10 @@ SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid)
if (!new)
return -ENOMEM;

- retval = security_task_setuid(ruid, euid, suid, LSM_SETID_RES);
- if (retval)
- goto error;
- old = current_cred();
+ retval = cred_setresuid(new, ruid, euid, suid, CRED_SETID_NOFORCE);
+ if (retval == 0)
+ return commit_creds(new);

- retval = -EPERM;
- if (!capable(CAP_SETUID)) {
- if (ruid != (uid_t) -1 && ruid != old->uid &&
- ruid != old->euid && ruid != old->suid)
- goto error;
- if (euid != (uid_t) -1 && euid != old->uid &&
- euid != old->euid && euid != old->suid)
- goto error;
- if (suid != (uid_t) -1 && suid != old->uid &&
- suid != old->euid && suid != old->suid)
- goto error;
- }
-
- if (ruid != (uid_t) -1) {
- new->uid = ruid;
- if (ruid != old->uid) {
- retval = set_user(new);
- if (retval < 0)
- goto error;
- }
- }
- if (euid != (uid_t) -1)
- new->euid = euid;
- if (suid != (uid_t) -1)
- new->suid = suid;
- new->fsuid = new->euid;
-
- retval = security_task_fix_setuid(new, old, LSM_SETID_RES);
- if (retval < 0)
- goto error;
-
- return commit_creds(new);
-
-error:
abort_creds(new);
return retval;
}
@@ -779,43 +743,17 @@ SYSCALL_DEFINE3(getresuid, uid_t __user *, ruid, uid_t __user *, euid, uid_t __u
*/
SYSCALL_DEFINE3(setresgid, gid_t, rgid, gid_t, egid, gid_t, sgid)
{
- const struct cred *old;
struct cred *new;
int retval;

new = prepare_creds();
if (!new)
return -ENOMEM;
- old = current_cred();

- retval = security_task_setgid(rgid, egid, sgid, LSM_SETID_RES);
- if (retval)
- goto error;
+ retval = cred_setresgid(new, rgid, egid, sgid, CRED_SETID_NOFORCE);
+ if (retval == 0)
+ return commit_creds(new);

- retval = -EPERM;
- if (!capable(CAP_SETGID)) {
- if (rgid != (gid_t) -1 && rgid != old->gid &&
- rgid != old->egid && rgid != old->sgid)
- goto error;
- if (egid != (gid_t) -1 && egid != old->gid &&
- egid != old->egid && egid != old->sgid)
- goto error;
- if (sgid != (gid_t) -1 && sgid != old->gid &&
- sgid != old->egid && sgid != old->sgid)
- goto error;
- }
-
- if (rgid != (gid_t) -1)
- new->gid = rgid;
- if (egid != (gid_t) -1)
- new->egid = egid;
- if (sgid != (gid_t) -1)
- new->sgid = sgid;
- new->fsgid = new->egid;
-
- return commit_creds(new);
-
-error:
abort_creds(new);
return retval;
}
@@ -841,35 +779,20 @@ SYSCALL_DEFINE3(getresgid, gid_t __user *, rgid, gid_t __user *, egid, gid_t __u
*/
SYSCALL_DEFINE1(setfsuid, uid_t, uid)
{
- const struct cred *old;
struct cred *new;
uid_t old_fsuid;
+ int retval;

new = prepare_creds();
if (!new)
return current_fsuid();
- old = current_cred();
- old_fsuid = old->fsuid;
-
- if (security_task_setuid(uid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS) < 0)
- goto error;
-
- if (uid == old->uid || uid == old->euid ||
- uid == old->suid || uid == old->fsuid ||
- capable(CAP_SETUID)) {
- if (uid != old_fsuid) {
- new->fsuid = uid;
- if (security_task_fix_setuid(new, old, LSM_SETID_FS) == 0)
- goto change_okay;
- }
- }

-error:
- abort_creds(new);
- return old_fsuid;
+ retval = cred_setfsuid(new, uid, &old_fsuid);
+ if (retval == 0)
+ commit_creds(new);
+ else
+ abort_creds(new);

-change_okay:
- commit_creds(new);
return old_fsuid;
}

@@ -878,34 +801,20 @@ change_okay:
*/
SYSCALL_DEFINE1(setfsgid, gid_t, gid)
{
- const struct cred *old;
struct cred *new;
gid_t old_fsgid;
+ int retval;

new = prepare_creds();
if (!new)
return current_fsgid();
- old = current_cred();
- old_fsgid = old->fsgid;
-
- if (security_task_setgid(gid, (gid_t)-1, (gid_t)-1, LSM_SETID_FS))
- goto error;
-
- if (gid == old->gid || gid == old->egid ||
- gid == old->sgid || gid == old->fsgid ||
- capable(CAP_SETGID)) {
- if (gid != old_fsgid) {
- new->fsgid = gid;
- goto change_okay;
- }
- }

-error:
- abort_creds(new);
- return old_fsgid;
+ retval = cred_setfsgid(new, gid, &old_fsgid);
+ if (retval == 0)
+ commit_creds(new);
+ else
+ abort_creds(new);

-change_okay:
- commit_creds(new);
return old_fsgid;
}

--
1.7.0.4

2010-04-27 20:44:04

by Serge E. Hallyn

[permalink] [raw]
Subject: [PATCH 2/3] p9auth: add CAP_GRANT_ID to authorize use of /dev/caphash

Granting userid capabilities to another task is a dangerous
privilege. Don't just let file permissions authorize it.
Define CAP_GRANT_ID as a new capability needed to write to
/dev/caphash.

For one thing this lets us start a factotum server early on
in init, then have init drop CAP_GRANT_ID from its bounding
set so the rest of the system cannot regain it.

(This patch is only useful if the next patch, introducing p9auth fs, is
upstreamed)

TODO - patch for capabilities.7 manpage

Signed-off-by: Serge E. Hallyn <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Andrew Morgan <[email protected]>
Cc: James Morris <[email protected]>
Cc: [email protected]
---
include/linux/capability.h | 6 +++++-
security/selinux/include/classmap.h | 2 +-
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/capability.h b/include/linux/capability.h
index 39e5ff5..ba2cbfe 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -355,7 +355,11 @@ struct cpu_vfs_cap_data {

#define CAP_MAC_ADMIN 33

-#define CAP_LAST_CAP CAP_MAC_ADMIN
+/* Allow granting setuid capabilities through p9auth /dev/caphash */
+
+#define CAP_GRANT_ID 34
+
+#define CAP_LAST_CAP CAP_GRANT_ID

#define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)

diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index 8b32e95..f0ec53a 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -142,7 +142,7 @@ struct security_class_mapping secclass_map[] = {
"node_bind", "name_connect", NULL } },
{ "memprotect", { "mmap_zero", NULL } },
{ "peer", { "recv", NULL } },
- { "capability2", { "mac_override", "mac_admin", NULL } },
+ { "capability2", { "mac_override", "mac_admin", "grant_id", NULL } },
{ "kernel_service", { "use_as_override", "create_files_as", NULL } },
{ "tun_socket",
{ COMMON_SOCK_PERMS, NULL } },
--
1.7.0.4

2010-04-27 20:44:19

by Serge E. Hallyn

[permalink] [raw]
Subject: [PATCH 3/3] RFC: p9auth: add p9auth fs

This introduces a Plan 9 style setuid capability filesystem.
See Documentation/p9auth.txt for a description of how to use this.

This fs allows the implementation of completely unprivileged
login daemons. However, doing so requires a fundamental change
regarding linux userids: a server privileged with the new
CAP_GRANT_ID capability can create a one-time setuid capability
allowing another process to change to one specific new userid.
This is a change which must be discussed. The use of this
privilege can be completely prevented by having init remove
CAP_GRANT_ID from its capability bounding set before forking any
processes.

Changelog
Apr 24:
return commit_creds (David Howells)
switch from dev to fs (Eric Biederman)
and move p9auth from drivers/char into kernel/

Signed-off-by: Serge E. Hallyn <[email protected]>
Cc: Ashwin Ganti <[email protected]>
---
Documentation/p9auth.txt | 42 ++++
MAINTAINERS | 6 +
init/Kconfig | 2 +
kernel/Kconfig.p9auth | 9 +
kernel/Makefile | 1 +
kernel/p9auth.c | 464 ++++++++++++++++++++++++++++++++++++++++++++++
6 files changed, 524 insertions(+), 0 deletions(-)
create mode 100644 Documentation/p9auth.txt
create mode 100644 kernel/Kconfig.p9auth
create mode 100644 kernel/p9auth.c

diff --git a/Documentation/p9auth.txt b/Documentation/p9auth.txt
new file mode 100644
index 0000000..9e9f674
--- /dev/null
+++ b/Documentation/p9auth.txt
@@ -0,0 +1,42 @@
+The p9auth filesystem provides a plan-9 factotum-like setuid capability
+API. Tasks which are privileged (authorized by possession of the
+CAP_GRANT_ID privilege (POSIX capability)) can write new capabilities to
+the p9authfs file called cred_grant. The kernel then stores these until
+a task uses them by writing to the cred_use file. Each capability
+represents the ability for a task running as userid X to switch to
+userid Y and some set of groups. Each capability may be used only once,
+and unused capabilities are cleared after two minutes.
+
+The following examples shows how to use the API. Shell 1 contains a
+privileged root shell. Shell 2 contains an unprivileged shell as user
+501 in the same user namespace. If not already done, the privileged
+shell should mount the p9auth filesystem:
+
+ mkdir /mnt/p9auth
+ mount -t p9auth p9auth /mnt/p9auth
+
+Now shell 2 somehow communicates to shell 1 that it possesses valid
+login credentials to switch to userid 502. Shell 2 then looks up the
+groups which uid 502 is a member of, and builds a capability string to
+pass to the kernel. It does this by concatenating the old userid, new
+userid, new primary group, number of auxiliary groups, and each
+auxiliary group, all as integers separated by '@'. The resulting string
+is hashed with a random string. In our example, userid 501 may
+transition to userid 502, with primary group 502 and auxiliary group 29.
+
+ capstr="501@502@502@1@29"
+ echo -n "$capstr" > /tmp/txtfile
+ randstr=`dd if=/dev/urandom count=1 2>/dev/null | \
+ uuencode -m - | head -n 2 | tail -n 1 | cut -c -8 `
+ openssl sha1 -hmac "$randstr" /tmp/txtfile | awk '{ print $2 '} \
+ > /tmp/hex
+ ./unhex < /tmp/hex > /mnt/p9auth/cred_grant
+
+Note that to use an empty set of auxiliary groups, you may use
+ capstr = "501@502@02@0"
+
+The source for unhex.c can be found in the ltp testsuite under
+ltp-dev/testcases/kernel/security/p9auth. To shell 2 it passes $capstr
+and $randstr. Shell 2 can then transition to the new userid by doing
+
+ echo -n "$capstr@$randstr" > /mnt/p9auth/cred_use
diff --git a/MAINTAINERS b/MAINTAINERS
index a0e3c3a..6bc1bd9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4209,6 +4209,12 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/mwu/mac80211-drivers.git
S: Maintained
F: drivers/net/wireless/p54/

+P9AUTH setuid capability filesystem
+M: [email protected]
+L: [email protected] (suggested Cc:)
+S: Maintained
+F: kernel/p9auth.c
+
PA SEMI ETHERNET DRIVER
M: Olof Johansson <[email protected]>
L: [email protected]
diff --git a/init/Kconfig b/init/Kconfig
index eb77e8c..bc7f1da 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -715,6 +715,8 @@ config NET_NS
Allow user space to create what appear to be multiple instances
of the network stack.

+source "kernel/Kconfig.p9auth"
+
config BLK_DEV_INITRD
bool "Initial RAM filesystem and RAM disk (initramfs/initrd) support"
depends on BROKEN || !FRV
diff --git a/kernel/Kconfig.p9auth b/kernel/Kconfig.p9auth
new file mode 100644
index 0000000..d1c66d2
--- /dev/null
+++ b/kernel/Kconfig.p9auth
@@ -0,0 +1,9 @@
+config PLAN9AUTH
+ tristate "Plan 9 style capability device implementation"
+ default n
+ depends on CRYPTO
+ help
+ This module implements the Plan 9 style capability device.
+
+ To compile this driver as a module, choose
+ M here: the module will be called p9auth.
diff --git a/kernel/Makefile b/kernel/Makefile
index a987aa1..d27dae3 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -105,6 +105,7 @@ obj-$(CONFIG_PERF_EVENTS) += perf_event.o
obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
obj-$(CONFIG_USER_RETURN_NOTIFIER) += user-return-notifier.o
obj-$(CONFIG_PADATA) += padata.o
+obj-$(CONFIG_PLAN9AUTH) += p9auth.o

ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
# According to Alan Modra <[email protected]>, the -fno-omit-frame-pointer is
diff --git a/kernel/p9auth.c b/kernel/p9auth.c
new file mode 100644
index 0000000..a174373
--- /dev/null
+++ b/kernel/p9auth.c
@@ -0,0 +1,464 @@
+/*
+ * Plan 9 style setuid capability implementation for the Linux Kernel
+ *
+ * Copyright 2009, 2010 Serge Hallyn <[email protected]>
+ * Copyright 2008, 2009 Ashwin Ganti <[email protected]>
+ *
+ * Released under the GPLv2
+ *
+ */
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/errno.h>
+#include <linux/fcntl.h>
+#include <linux/uaccess.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/string.h>
+#include <linux/crypto.h>
+#include <linux/highmem.h>
+#include <linux/scatterlist.h>
+#include <linux/sched.h>
+#include <linux/cred.h>
+#include <linux/user_namespace.h>
+
+#define MAX_DIGEST_SIZE 20
+
+struct cap_node {
+ char data[MAX_DIGEST_SIZE];
+ struct user_namespace *user_ns;
+ unsigned long time_created;
+ struct list_head list;
+};
+
+/* make CAP_HASH_COUNT_LIM configurable sometime, and per-userns */
+
+#define CAP_HASH_COUNT_LIM 4000
+
+/*
+ * cap_list, the list of valid capability tokens
+ * todo: move into user_namespace?
+ */
+static LIST_HEAD(cap_list);
+static int cap_hash_count; /* number of entries cap_list */
+
+/*
+ * Locking: writing to both /cred_grant and /cred_use are done
+ * entirely under cap_mutex. So the cap_list and cap_hash_count
+ * are protected by the mutex. These are not fast paths, so a
+ * mutex is just fine.
+ *
+ * Writing to cred_crant only adds an entry to the list, so is safe.
+ * Writing to cred_use only updates current's credentials.
+ */
+static DEFINE_MUTEX(cap_mutex);
+
+MODULE_AUTHOR("Ashwin Ganti");
+MODULE_LICENSE("GPL");
+
+static char *cap_hash(char *plain_text, unsigned int plain_text_size,
+ char *key, unsigned int key_size)
+{
+ struct scatterlist sg;
+ char *result;
+ struct crypto_hash *tfm;
+ struct hash_desc desc;
+ int ret;
+
+ tfm = crypto_alloc_hash("hmac(sha1)", 0, CRYPTO_ALG_ASYNC);
+ if (IS_ERR(tfm)) {
+ printk(KERN_ERR
+ "failed to load transform for hmac(sha1): %ld\n",
+ PTR_ERR(tfm));
+ return NULL;
+ }
+
+ desc.tfm = tfm;
+ desc.flags = 0;
+
+ result = kzalloc(MAX_DIGEST_SIZE, GFP_KERNEL);
+ if (!result) {
+ printk(KERN_ERR "out of memory!\n");
+ goto out;
+ }
+
+ sg_set_buf(&sg, plain_text, plain_text_size);
+
+ ret = crypto_hash_setkey(tfm, key, key_size);
+ if (ret) {
+ printk(KERN_ERR "setkey() failed ret=%d\n", ret);
+ kfree(result);
+ result = NULL;
+ goto out;
+ }
+
+ ret = crypto_hash_digest(&desc, &sg, plain_text_size, result);
+ if (ret) {
+ printk(KERN_ERR "digest () failed ret=%d\n", ret);
+ kfree(result);
+ result = NULL;
+ goto out;
+ }
+
+out:
+ crypto_free_hash(tfm);
+ return result;
+}
+
+struct id_set {
+ char *source_user, *target_user;
+ uid_t old_uid, new_uid;
+ gid_t new_gid;
+ unsigned int ngroups;
+ struct group_info *newgroups;
+ char *full; /* The full entry which must be freed */
+};
+
+/*
+ * read an entry, which is of the form:
+ * source_user@target_user@target_group@numgroups@grp1..@grpn@rand
+ * and put all the values into the supplied id_set.
+ */
+static int parse_user_capability(char *s, struct id_set *set)
+{
+ char *tmp, *tmpu;
+ int i, ret;
+ unsigned long res;
+
+ tmpu = set->full = kstrdup(s, GFP_KERNEL);
+ if (!tmpu)
+ return -ENOMEM;
+
+ ret = -EINVAL;
+ set->source_user = strsep(&tmpu, "@");
+ set->target_user = strsep(&tmpu, "@");
+ tmp = strsep(&tmpu, "@");
+ if (!set->source_user || !set->target_user || !tmp)
+ goto out;
+
+ if (strict_strtoul(set->target_user, 0, &res))
+ goto out;
+ set->new_uid = (uid_t) res;
+ if (strict_strtoul(set->source_user, 0, &res))
+ goto out;
+ set->old_uid = (uid_t) res;
+ if (strict_strtoul(tmp, 0, &res))
+ goto out;
+ set->new_gid = (gid_t) res;
+
+ tmp = strsep(&tmpu, "@");
+ if (!tmp)
+ goto out;
+ if (sscanf(tmp, "%d", &set->ngroups) != 1 || set->ngroups < 0)
+ goto out;
+
+ ret = -ENOMEM;
+ set->newgroups = groups_alloc(set->ngroups);
+ if (!set->newgroups)
+ goto out;
+
+ ret = -EINVAL;
+ for (i = 0; i < set->ngroups; i++) {
+ gid_t g;
+
+ tmp = strsep(&tmpu, "@");
+ if (!tmp || sscanf(tmp, "%d", &g) != 1) {
+ groups_free(set->newgroups);
+ goto out;
+ }
+ GROUP_AT(set->newgroups, i) = g;
+ }
+
+ ret = 0;
+
+out:
+ kfree(set->full);
+ return ret;
+}
+
+static int apply_setuid_capability(struct id_set *set)
+{
+ struct cred *new;
+ int ret;
+
+ /*
+ * Check whether the process writing to capuse
+ * is actually owned by the source owner
+ */
+ if (set->old_uid != current_uid()) {
+ printk(KERN_ALERT
+ "p9auth: process %d may switch from uid %d to %d, "
+ " but is uid %d (denied).\n", current->pid,
+ set->old_uid, set->new_uid, current_uid());
+ return -EFAULT;
+ }
+
+ /*
+ * Change uid, euid, and fsuid. The suid remains for
+ * flexibility - though I'm torn as to the tradeoff of
+ * usefulness vs. danger in that.
+ */
+ new = prepare_creds();
+ if (!new)
+ return -ENOMEM;
+
+ ret = set_groups(new, set->newgroups);
+ if (!ret)
+ ret = cred_setresgid(new, set->new_gid, set->new_gid,
+ set->new_gid, CRED_SETID_FORCE);
+ if (!ret)
+ ret = cred_setresuid(new, set->new_uid, set->new_uid,
+ set->new_uid, CRED_SETID_FORCE);
+ if (ret == 0)
+ return commit_creds(new);
+ abort_creds(new);
+ return ret;
+}
+
+/* Delete a capability entry from the list */
+static void del_cap_node(struct cap_node *node)
+{
+ list_del(&node->list);
+ put_user_ns(node->user_ns);
+ kfree(node);
+ cap_hash_count--;
+}
+
+/* Expose this through sysctl eventually? 2 min timeout for hashes */
+static int cap_timeout = 120;
+
+/* Remove unused entries older tha (cap_timeout) seconds */
+static void remove_stale_entries(void)
+{
+ struct cap_node *node, *tmp;
+
+ list_for_each_entry_safe(node, tmp, &cap_list, list)
+ if (node->time_created + HZ * cap_timeout < jiffies)
+ del_cap_node(node);
+}
+
+/*
+ * There are CAP_HASH_COUNT_LIM (4k) entries -
+ * trim the 5 oldest even though newer than cap_timeout
+ */
+static void trim_oldest_entries(void)
+{
+ struct cap_node *node, *tmp;
+ int i = 0;
+
+ list_for_each_entry_safe(node, tmp, &cap_list, list) {
+ if (++i > 5)
+ break;
+ del_cap_node(node);
+ }
+}
+
+/*
+ * Add a capability hash entry to the list - called by the
+ * privileged factotum server. Called with cap_mutex held.
+ */
+static int grant_setuid_capability(char *user_buf, size_t count)
+{
+ struct cap_node *node_ptr;
+
+ if (count > MAX_DIGEST_SIZE)
+ return -EINVAL;
+ if (!capable(CAP_GRANT_ID))
+ return -EPERM;
+ node_ptr = kmalloc(sizeof(struct cap_node), GFP_KERNEL);
+ if (!node_ptr)
+ return -ENOMEM;
+
+ memcpy(node_ptr->data, user_buf, count);
+ node_ptr->user_ns = get_user_ns(current_user_ns());
+ node_ptr->time_created = jiffies;
+ list_add(&(node_ptr->list), &(cap_list));
+ cap_hash_count++;
+ remove_stale_entries();
+ if (cap_hash_count > CAP_HASH_COUNT_LIM)
+ trim_oldest_entries();
+
+ return 0;
+}
+
+/*
+ * Use a capability hash entry from the list - called by the
+ * unprivileged login daemon. Called with cap_mutex held.
+ */
+static int use_setuid_capability(char *ubuf)
+{
+ struct cap_node *node;
+ struct id_set set;
+ int ret, found = 0;
+ char *hashed = NULL, *sep;
+ struct list_head *pos;
+
+ if (list_empty(&(cap_list)))
+ return -EINVAL;
+
+ ret = parse_user_capability(ubuf, &set);
+ if (ret)
+ return ret;
+
+ /*
+ * hash the string user1@user2@ngrp@grp... with randstr as the key
+ * XXX is there any vulnerability we're opening ourselves up to by
+ * not rebuilding the string from its components?
+ */
+ sep = strrchr(ubuf, '@');
+ if (sep) {
+ char *rand = sep + 1;
+ *sep = '\0';
+ hashed = cap_hash(ubuf, strlen(ubuf), rand, strlen(rand));
+ }
+ if (NULL == hashed) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ /* Change the process's uid if the hash is present in the
+ * list of hashes
+ */
+ list_for_each(pos, &(cap_list)) {
+ node = list_entry(pos, struct cap_node, list);
+ if (current_user_ns() != node->user_ns)
+ continue;
+ if (0 == memcmp(hashed, node->data, MAX_DIGEST_SIZE)) {
+ ret = apply_setuid_capability(&set);
+ if (ret < 0)
+ goto out;
+
+ /* Capability may only be used once */
+ del_cap_node(node);
+ found = 1;
+ break;
+ }
+ }
+ if (!found) {
+ printk(KERN_ALERT
+ "Invalid capabiliy written to /dev/capuse\n");
+ ret = -EFAULT;
+ }
+out:
+ put_group_info(set.newgroups);
+ kfree(hashed);
+ return ret;
+}
+
+static ssize_t p9auth_grant_write(struct file *file, const char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+ ssize_t retval = -ENOMEM;
+ char *user_buf;
+
+ if (mutex_lock_interruptible(&cap_mutex))
+ return -EINTR;
+
+ user_buf = kzalloc(count+1, GFP_KERNEL);
+ if (!user_buf)
+ goto out;
+
+ if (copy_from_user(user_buf, buffer, count)) {
+ retval = -EFAULT;
+ goto out;
+ }
+
+ retval = grant_setuid_capability(user_buf, count);
+
+ *ppos += count;
+ retval = count;
+
+out:
+ kfree(user_buf);
+ mutex_unlock(&cap_mutex);
+ return retval;
+}
+
+static const struct file_operations p9auth_grant_operations = {
+ .write = p9auth_grant_write,
+};
+
+static ssize_t p9auth_use_write(struct file *file, const char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+ ssize_t retval = -ENOMEM;
+ char *user_buf;
+
+ if (mutex_lock_interruptible(&cap_mutex))
+ return -EINTR;
+
+ user_buf = kzalloc(count+1, GFP_KERNEL);
+ if (!user_buf)
+ goto out;
+
+ if (copy_from_user(user_buf, buffer, count)) {
+ retval = -EFAULT;
+ goto out;
+ }
+
+ retval = use_setuid_capability(user_buf);
+
+ *ppos += count;
+ retval = count;
+
+out:
+ kfree(user_buf);
+ mutex_unlock(&cap_mutex);
+ return retval;
+}
+
+static const struct file_operations p9auth_use_operations = {
+ .write = p9auth_use_write,
+};
+
+#define P9AUTHFS_MAGIC 0xbc148c66
+
+static int p9auth_fill_super(struct super_block *sb, void *data, int silent)
+{
+ static struct tree_descr files[] = {
+ [2] = {"cred_grant", &p9auth_grant_operations, S_IWUSR},
+ [3] = {"cred_use", &p9auth_use_operations, S_IWUGO},
+ {""}
+ };
+
+ return simple_fill_super(sb, P9AUTHFS_MAGIC, files);
+}
+
+static int p9auth_get_sb(struct file_system_type *fs_type,
+ int flags, const char *dev_name, void *data, struct vfsmount *mnt)
+{
+ return get_sb_nodev(fs_type, flags, data, p9auth_fill_super, mnt);
+}
+
+static struct file_system_type p9auth_fs_type = {
+ .owner = THIS_MODULE,
+ .name = "p9auth",
+ .get_sb = p9auth_get_sb,
+ .kill_sb = kill_litter_super,
+};
+
+/* delete all hashed entries (at module exit) */
+static void clear_setuid_capabilities(void)
+{
+ struct cap_node *node, *tmp;
+
+ list_for_each_entry_safe(node, tmp, &cap_list, list)
+ del_cap_node(node);
+}
+
+/* no __exit here because it can be called by the init function */
+static void cap_cleanup_module(void)
+{
+ clear_setuid_capabilities();
+ unregister_filesystem(&p9auth_fs_type);
+}
+
+static int __init cap_init_module(void)
+{
+ return register_filesystem(&p9auth_fs_type);
+}
+
+module_init(cap_init_module);
+module_exit(cap_cleanup_module);
--
1.7.0.4

2010-04-28 15:42:13

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/3] RFC: p9auth: add p9auth fs

On 04/28, Serge E. Hallyn wrote:
>
> Quoting Oleg Nesterov ([email protected]):
>
> > > +static ssize_t p9auth_use_write(struct file *file, const char __user *buffer,
> > > + size_t count, loff_t *ppos)
> > > +{
> > > + ssize_t retval = -ENOMEM;
> > > + char *user_buf;
> > > +
> > > + if (mutex_lock_interruptible(&cap_mutex))
> > > + return -EINTR;
> >
> > EINTR doesn't look exactly right here, especially if TIF_SIGPENDING is
> > spurious. Probably ERESTARTNOINTR makes more sense. Or mutex_lock_killable().
>
> Ashwin had had this as ERESTARTSYS I believe. I'd read something about
> userspace should only see -EINTR so I changed it.

Yes, ERESTARTxxx should not be visible to the user-space. But it is OK
to return it from the syscall if signal_pending() is true, in this case
it will be changed to EINTR or the system call will be restarted.

ERESTARTNOINTR always restarts the syscall, perphaps after handling the
signal if it is really pending.

> > > + if (copy_from_user(user_buf, buffer, count)) {
> > > + retval = -EFAULT;
> > > + goto out;
> > > + }
> > > +
> > > + retval = use_setuid_capability(user_buf);
> >
> > It seems that use_setuid_capability() pathes assume that user_buf is
> > null terminated? Say, parse_user_capability() does kstrdup(user_buf).
>
> I kzalloc()d to count+1 before, and only copy_from_user() count bytes,
> so the last byte should always be 0.

Ah, indeed, thanks.

Oleg.

2010-04-28 15:10:55

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 3/3] RFC: p9auth: add p9auth fs

Quoting Oleg Nesterov ([email protected]):
> On 04/27, Serge E. Hallyn wrote:
> >
> > This introduces a Plan 9 style setuid capability filesystem.
> > See Documentation/p9auth.txt for a description of how to use this.
>
> Can't comment these changes due to the lack of knowledge, just
> a couple of minor nits.

Thanks, Oleg.

> > +static ssize_t p9auth_use_write(struct file *file, const char __user *buffer,
> > + size_t count, loff_t *ppos)
> > +{
> > + ssize_t retval = -ENOMEM;
> > + char *user_buf;
> > +
> > + if (mutex_lock_interruptible(&cap_mutex))
> > + return -EINTR;
>
> EINTR doesn't look exactly right here, especially if TIF_SIGPENDING is
> spurious. Probably ERESTARTNOINTR makes more sense. Or mutex_lock_killable().

Ashwin had had this as ERESTARTSYS I believe. I'd read something about
userspace should only see -EINTR so I changed it. Sounds like I need
to follow the caller chain some more and learn a thing or two, before
I repost.

> > + user_buf = kzalloc(count+1, GFP_KERNEL);
>
> Probably this is OK, but it looks a bit strange we do no check that
> count is not too large.

Yes, I should check that, thanks!

> > + if (copy_from_user(user_buf, buffer, count)) {
> > + retval = -EFAULT;
> > + goto out;
> > + }
> > +
> > + retval = use_setuid_capability(user_buf);
>
> It seems that use_setuid_capability() pathes assume that user_buf is
> null terminated? Say, parse_user_capability() does kstrdup(user_buf).

I kzalloc()d to count+1 before, and only copy_from_user() count bytes,
so the last byte should always be 0.

Thanks again,

-serge

2010-04-28 11:21:22

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/3] RFC: p9auth: add p9auth fs

On 04/27, Serge E. Hallyn wrote:
>
> This introduces a Plan 9 style setuid capability filesystem.
> See Documentation/p9auth.txt for a description of how to use this.

Can't comment these changes due to the lack of knowledge, just
a couple of minor nits.

> +static ssize_t p9auth_use_write(struct file *file, const char __user *buffer,
> + size_t count, loff_t *ppos)
> +{
> + ssize_t retval = -ENOMEM;
> + char *user_buf;
> +
> + if (mutex_lock_interruptible(&cap_mutex))
> + return -EINTR;

EINTR doesn't look exactly right here, especially if TIF_SIGPENDING is
spurious. Probably ERESTARTNOINTR makes more sense. Or mutex_lock_killable().

> + user_buf = kzalloc(count+1, GFP_KERNEL);

Probably this is OK, but it looks a bit strange we do no check that
count is not too large.

> + if (copy_from_user(user_buf, buffer, count)) {
> + retval = -EFAULT;
> + goto out;
> + }
> +
> + retval = use_setuid_capability(user_buf);

It seems that use_setuid_capability() pathes assume that user_buf is
null terminated? Say, parse_user_capability() does kstrdup(user_buf).

Oleg.