2010-04-21 01:27:57

by Serge E. Hallyn

[permalink] [raw]
Subject: [PATCH 1/3] p9auth: split core function out of some set*{u,g}id functions

Break the core functionality of set{fs,res}{u,g}id into cred_setX
which performs the access checks based on current_cred(), but performs
the requested change on a passed-in cred.

Export the helpers, since p9auth can be compiled as a module. It
might be worth not allowing modular p9auth to avoid having to export
them.

Really the setfs{u,g}id helper isn't needed, but move it as
well to keep the code consistent.

This patch also changes set_user() to use new->user->user_ns. While
technically not needed as all callers should have new->user->user_ns
equal to current_userns(), it is more correct and may prevent surprises
in the future.

Signed-off-by: Serge E. Hallyn <[email protected]>
---
include/linux/cred.h | 12 +++++
kernel/cred.c | 119 +++++++++++++++++++++++++++++++++++++++++++++
kernel/sys.c | 131 ++++++++------------------------------------------
3 files changed, 151 insertions(+), 111 deletions(-)

diff --git a/include/linux/cred.h b/include/linux/cred.h
index 52507c3..0ce7a50 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -22,6 +22,9 @@ struct user_struct;
struct cred;
struct inode;

+/* defined in sys.c, used in cred_setresuid */
+extern int set_user(struct cred *new);
+
/*
* COW Supplementary groups list
*/
@@ -396,4 +399,13 @@ do { \
*(_fsgid) = __cred->fsgid; \
} while(0)

+#define CRED_SETID_NOFORCE 0
+#define CRED_SETID_FORCE 1
+int cred_setresuid(struct cred *new, uid_t ruid, uid_t euid, uid_t suid,
+ int force);
+int cred_setresgid(struct cred *new, gid_t rgid, gid_t egid, gid_t sgid,
+ int force);
+int cred_setfsuid(struct cred *new, uid_t uid, uid_t *old_fsuid);
+int cred_setfsgid(struct cred *new, gid_t gid, gid_t *old_fsgid);
+
#endif /* _LINUX_CRED_H */
diff --git a/kernel/cred.c b/kernel/cred.c
index e1dbe9e..4fc3284 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -785,6 +785,125 @@ int set_create_files_as(struct cred *new, struct inode *inode)
}
EXPORT_SYMBOL(set_create_files_as);

+int cred_setresuid(struct cred *new, uid_t ruid, uid_t euid, uid_t suid,
+ int force)
+{
+ int retval;
+ const struct cred *old;
+
+ retval = security_task_setuid(ruid, euid, suid, LSM_SETID_RES);
+ if (retval)
+ return retval;
+ old = current_cred();
+
+ if (!force && !capable(CAP_SETUID)) {
+ if (ruid != (uid_t) -1 && ruid != old->uid &&
+ ruid != old->euid && ruid != old->suid)
+ return -EPERM;
+ if (euid != (uid_t) -1 && euid != old->uid &&
+ euid != old->euid && euid != old->suid)
+ return -EPERM;
+ if (suid != (uid_t) -1 && suid != old->uid &&
+ suid != old->euid && suid != old->suid)
+ return -EPERM;
+ }
+
+ if (ruid != (uid_t) -1) {
+ new->uid = ruid;
+ if (ruid != old->uid) {
+ retval = set_user(new);
+ if (retval < 0)
+ return retval;
+ }
+ }
+ if (euid != (uid_t) -1)
+ new->euid = euid;
+ if (suid != (uid_t) -1)
+ new->suid = suid;
+ new->fsuid = new->euid;
+
+ return security_task_fix_setuid(new, old, LSM_SETID_RES);
+}
+EXPORT_SYMBOL_GPL(cred_setresuid);
+
+int cred_setresgid(struct cred *new, gid_t rgid, gid_t egid, gid_t sgid,
+ int force)
+{
+ const struct cred *old = current_cred();
+ int retval;
+
+ retval = security_task_setgid(rgid, egid, sgid, LSM_SETID_RES);
+ if (retval)
+ return retval;
+
+ if (!force && !capable(CAP_SETGID)) {
+ if (rgid != (gid_t) -1 && rgid != old->gid &&
+ rgid != old->egid && rgid != old->sgid)
+ return -EPERM;
+ if (egid != (gid_t) -1 && egid != old->gid &&
+ egid != old->egid && egid != old->sgid)
+ return -EPERM;
+ if (sgid != (gid_t) -1 && sgid != old->gid &&
+ sgid != old->egid && sgid != old->sgid)
+ return -EPERM;
+ }
+
+ if (rgid != (gid_t) -1)
+ new->gid = rgid;
+ if (egid != (gid_t) -1)
+ new->egid = egid;
+ if (sgid != (gid_t) -1)
+ new->sgid = sgid;
+ new->fsgid = new->egid;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(cred_setresgid);
+
+int cred_setfsuid(struct cred *new, uid_t uid, uid_t *old_fsuid)
+{
+ const struct cred *old;
+
+ old = current_cred();
+ *old_fsuid = old->fsuid;
+
+ if (security_task_setuid(uid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS) < 0)
+ return -EPERM;
+
+ if (uid == old->uid || uid == old->euid ||
+ uid == old->suid || uid == old->fsuid ||
+ capable(CAP_SETUID)) {
+ if (uid != *old_fsuid) {
+ new->fsuid = uid;
+ if (security_task_fix_setuid(new, old, LSM_SETID_FS) == 0)
+ return 0;
+ }
+ }
+ return -EPERM;
+}
+EXPORT_SYMBOL_GPL(cred_setfsuid);
+
+int cred_setfsgid(struct cred *new, gid_t gid, gid_t *old_fsgid)
+{
+ const struct cred *old;
+
+ old = current_cred();
+ *old_fsgid = old->fsgid;
+
+ if (security_task_setgid(gid, (gid_t)-1, (gid_t)-1, LSM_SETID_FS))
+ return -EPERM;
+
+ if (gid == old->gid || gid == old->egid ||
+ gid == old->sgid || gid == old->fsgid ||
+ capable(CAP_SETGID)) {
+ if (gid != *old_fsgid) {
+ new->fsgid = gid;
+ return 0;
+ }
+ }
+ return -EPERM;
+}
+EXPORT_SYMBOL_GPL(cred_setfsgid);
+
#ifdef CONFIG_DEBUG_CREDENTIALS

bool creds_are_invalid(const struct cred *cred)
diff --git a/kernel/sys.c b/kernel/sys.c
index 6d1a7e0..78f32eb 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -565,11 +565,11 @@ error:
/*
* change the user struct in a credentials set to match the new UID
*/
-static int set_user(struct cred *new)
+int set_user(struct cred *new)
{
struct user_struct *new_user;

- new_user = alloc_uid(current_user_ns(), new->uid);
+ new_user = alloc_uid(new->user->user_ns, new->uid);
if (!new_user)
return -EAGAIN;

@@ -711,7 +711,6 @@ error:
*/
SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid)
{
- const struct cred *old;
struct cred *new;
int retval;

@@ -719,45 +718,10 @@ SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid)
if (!new)
return -ENOMEM;

- retval = security_task_setuid(ruid, euid, suid, LSM_SETID_RES);
- if (retval)
- goto error;
- old = current_cred();
+ retval = cred_setresuid(new, ruid, euid, suid, CRED_SETID_NOFORCE);
+ if (retval == 0)
+ return commit_creds(new);

- retval = -EPERM;
- if (!capable(CAP_SETUID)) {
- if (ruid != (uid_t) -1 && ruid != old->uid &&
- ruid != old->euid && ruid != old->suid)
- goto error;
- if (euid != (uid_t) -1 && euid != old->uid &&
- euid != old->euid && euid != old->suid)
- goto error;
- if (suid != (uid_t) -1 && suid != old->uid &&
- suid != old->euid && suid != old->suid)
- goto error;
- }
-
- if (ruid != (uid_t) -1) {
- new->uid = ruid;
- if (ruid != old->uid) {
- retval = set_user(new);
- if (retval < 0)
- goto error;
- }
- }
- if (euid != (uid_t) -1)
- new->euid = euid;
- if (suid != (uid_t) -1)
- new->suid = suid;
- new->fsuid = new->euid;
-
- retval = security_task_fix_setuid(new, old, LSM_SETID_RES);
- if (retval < 0)
- goto error;
-
- return commit_creds(new);
-
-error:
abort_creds(new);
return retval;
}
@@ -779,43 +743,17 @@ SYSCALL_DEFINE3(getresuid, uid_t __user *, ruid, uid_t __user *, euid, uid_t __u
*/
SYSCALL_DEFINE3(setresgid, gid_t, rgid, gid_t, egid, gid_t, sgid)
{
- const struct cred *old;
struct cred *new;
int retval;

new = prepare_creds();
if (!new)
return -ENOMEM;
- old = current_cred();

- retval = security_task_setgid(rgid, egid, sgid, LSM_SETID_RES);
- if (retval)
- goto error;
+ retval = cred_setresgid(new, rgid, egid, sgid, CRED_SETID_NOFORCE);
+ if (retval == 0)
+ return commit_creds(new);

- retval = -EPERM;
- if (!capable(CAP_SETGID)) {
- if (rgid != (gid_t) -1 && rgid != old->gid &&
- rgid != old->egid && rgid != old->sgid)
- goto error;
- if (egid != (gid_t) -1 && egid != old->gid &&
- egid != old->egid && egid != old->sgid)
- goto error;
- if (sgid != (gid_t) -1 && sgid != old->gid &&
- sgid != old->egid && sgid != old->sgid)
- goto error;
- }
-
- if (rgid != (gid_t) -1)
- new->gid = rgid;
- if (egid != (gid_t) -1)
- new->egid = egid;
- if (sgid != (gid_t) -1)
- new->sgid = sgid;
- new->fsgid = new->egid;
-
- return commit_creds(new);
-
-error:
abort_creds(new);
return retval;
}
@@ -841,35 +779,20 @@ SYSCALL_DEFINE3(getresgid, gid_t __user *, rgid, gid_t __user *, egid, gid_t __u
*/
SYSCALL_DEFINE1(setfsuid, uid_t, uid)
{
- const struct cred *old;
struct cred *new;
uid_t old_fsuid;
+ int retval;

new = prepare_creds();
if (!new)
return current_fsuid();
- old = current_cred();
- old_fsuid = old->fsuid;
-
- if (security_task_setuid(uid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS) < 0)
- goto error;
-
- if (uid == old->uid || uid == old->euid ||
- uid == old->suid || uid == old->fsuid ||
- capable(CAP_SETUID)) {
- if (uid != old_fsuid) {
- new->fsuid = uid;
- if (security_task_fix_setuid(new, old, LSM_SETID_FS) == 0)
- goto change_okay;
- }
- }

-error:
- abort_creds(new);
- return old_fsuid;
+ retval = cred_setfsuid(new, uid, &old_fsuid);
+ if (retval == 0)
+ commit_creds(new);
+ else
+ abort_creds(new);

-change_okay:
- commit_creds(new);
return old_fsuid;
}

@@ -878,34 +801,20 @@ change_okay:
*/
SYSCALL_DEFINE1(setfsgid, gid_t, gid)
{
- const struct cred *old;
struct cred *new;
gid_t old_fsgid;
+ int retval;

new = prepare_creds();
if (!new)
return current_fsgid();
- old = current_cred();
- old_fsgid = old->fsgid;
-
- if (security_task_setgid(gid, (gid_t)-1, (gid_t)-1, LSM_SETID_FS))
- goto error;
-
- if (gid == old->gid || gid == old->egid ||
- gid == old->sgid || gid == old->fsgid ||
- capable(CAP_SETGID)) {
- if (gid != old_fsgid) {
- new->fsgid = gid;
- goto change_okay;
- }
- }

-error:
- abort_creds(new);
- return old_fsgid;
+ retval = cred_setfsgid(new, gid, &old_fsgid);
+ if (retval == 0)
+ commit_creds(new);
+ else
+ abort_creds(new);

-change_okay:
- commit_creds(new);
return old_fsgid;
}

--
1.7.0.4


2010-04-21 01:28:20

by Serge E. Hallyn

[permalink] [raw]
Subject: [PATCH 2/3] p9auth: add CAP_GRANT_ID to authorize use of /dev/caphash


Granting userid capabilities to another task is a dangerous
privilege. Don't just let file permissions authorize it.
Define CAP_GRANT_ID as a new capability needed to write to
/dev/caphash.

For one thing this lets us start a factotum server early on
in init, then have init drop CAP_GRANT_ID from its bounding
set so the rest of the system cannot regain it.

Signed-off-by: Serge E. Hallyn <[email protected]>
---
include/linux/capability.h | 6 +++++-
security/selinux/include/classmap.h | 2 +-
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/capability.h b/include/linux/capability.h
index 39e5ff5..ba2cbfe 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -355,7 +355,11 @@ struct cpu_vfs_cap_data {

#define CAP_MAC_ADMIN 33

-#define CAP_LAST_CAP CAP_MAC_ADMIN
+/* Allow granting setuid capabilities through p9auth /dev/caphash */
+
+#define CAP_GRANT_ID 34
+
+#define CAP_LAST_CAP CAP_GRANT_ID

#define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)

diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index 8b32e95..f0ec53a 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -142,7 +142,7 @@ struct security_class_mapping secclass_map[] = {
"node_bind", "name_connect", NULL } },
{ "memprotect", { "mmap_zero", NULL } },
{ "peer", { "recv", NULL } },
- { "capability2", { "mac_override", "mac_admin", NULL } },
+ { "capability2", { "mac_override", "mac_admin", "grant_id", NULL } },
{ "kernel_service", { "use_as_override", "create_files_as", NULL } },
{ "tun_socket",
{ COMMON_SOCK_PERMS, NULL } },
--
1.7.0.4

2010-04-21 01:29:15

by Serge E. Hallyn

[permalink] [raw]
Subject: [PATCH 3/3] p9auth: add p9auth driver

This is a driver that adds Plan 9 style capability device
implementation. See Documentation/p9auth.txt for a description
of how to use this.

This driver allows the implementation of completely unprivileged
login daemons. However, doing so requires a fundamental change
regarding linux userids: a server privileged with the new
CAP_GRANT_ID capability can create a one-time setuid capability
allowing another process to change to one specific new userid.
This is a change which must be discussed. The use of this
privilege can be completely prevented by having init remove
CAP_GRANT_ID from its capability bounding set before forking any
processes.

Signed-off-by: Serge E. Hallyn <[email protected]>
---
Documentation/p9auth.txt | 47 ++++
drivers/char/Kconfig | 2 +
drivers/char/Makefile | 2 +
drivers/char/p9auth/Kconfig | 9 +
drivers/char/p9auth/Makefile | 1 +
drivers/char/p9auth/p9auth.c | 517 ++++++++++++++++++++++++++++++++++++++++++
6 files changed, 578 insertions(+), 0 deletions(-)
create mode 100644 Documentation/p9auth.txt
create mode 100644 drivers/char/p9auth/Kconfig
create mode 100644 drivers/char/p9auth/Makefile
create mode 100644 drivers/char/p9auth/p9auth.c

diff --git a/Documentation/p9auth.txt b/Documentation/p9auth.txt
new file mode 100644
index 0000000..14a69d8
--- /dev/null
+++ b/Documentation/p9auth.txt
@@ -0,0 +1,47 @@
+The p9auth device driver implements a plan-9 factotum-like
+capability API. Tasks which are privileged (authorized by
+possession of the CAP_GRANT_ID privilege (POSIX capability))
+can write new capabilities to /dev/caphash. The kernel then
+stores these until a task uses them by writing to the
+/dev/capuse device. Each capability represents the ability
+for a task running as userid X to switch to userid Y and
+some set of groups. Each capability may be used only once,
+and unused capabilities are cleared after two minutes.
+
+The following examples shows how to use the API. Shell 1
+contains a privileged root shell. Shell 2 contains an
+unprivileged shell as user 501 in the same user namespace. If
+not already done, the privileged shell should create the p9auth
+devices:
+
+ majfile=/sys/module/p9auth/parameters/cap_major
+ minfile=/sys/module/p9auth/parameters/cap_minor
+ maj=`cat $majfile`
+ mknod /dev/caphash c $maj 0
+ min=`cat $minfile`
+ mknod /dev/capuse c $maj 1
+ chmod ugo+w /dev/capuse
+
+Now shell 2 somehow communicates to shell 1 that it possesses
+valid login credentials to switch to userid 502. Shell 2 then
+looks up the groups which uid 502 is a member of, and builds
+a capability string to pass to the kernel. It does this by
+concatenating the old userid, new userid, new primary group,
+number of auxiliary groups, and each auxiliary group, all
+as integers separated by '@'. The resulting string is hashed
+with a random string. In our example, userid 501 may transition
+to userid 502, with primary group 502 and auxiliary group 29.
+
+ capstr="501@502@502@1@29"
+ echo -n "$capstr" > /tmp/txtfile
+ randstr=`dd if=/dev/urandom count=1 2>/dev/null | \
+ uuencode -m - | head -n 2 | tail -n 1 | cut -c -8 `
+ openssl sha1 -hmac "$randstr" /tmp/txtfile | awk '{ print $2 '} \
+ > /tmp/hex
+ ./unhex < /tmp/hex > /dev/caphash
+
+The source for unhex.c can be found in the ltp testsuite under
+ltp-dev/testcases/kernel/security/p9auth. To shell 2 it passes $capstr
+and $randstr. Shell 2 can then transition to the new userid by doing
+
+ echo -n "$capstr@$randstr" > /dev/capuse
diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
index 3141dd3..e7ff2a9 100644
--- a/drivers/char/Kconfig
+++ b/drivers/char/Kconfig
@@ -1113,5 +1113,7 @@ config DEVPORT

source "drivers/s390/char/Kconfig"

+source "drivers/char/p9auth/Kconfig"
+
endmenu

diff --git a/drivers/char/Makefile b/drivers/char/Makefile
index f957edf..3c27905 100644
--- a/drivers/char/Makefile
+++ b/drivers/char/Makefile
@@ -111,6 +111,8 @@ obj-$(CONFIG_PS3_FLASH) += ps3flash.o
obj-$(CONFIG_JS_RTC) += js-rtc.o
js-rtc-y = rtc.o

+obj-$(CONFIG_PLAN9AUTH) += p9auth/
+
# Files generated that shall be removed upon make clean
clean-files := consolemap_deftbl.c defkeymap.c

diff --git a/drivers/char/p9auth/Kconfig b/drivers/char/p9auth/Kconfig
new file mode 100644
index 0000000..d1c66d2
--- /dev/null
+++ b/drivers/char/p9auth/Kconfig
@@ -0,0 +1,9 @@
+config PLAN9AUTH
+ tristate "Plan 9 style capability device implementation"
+ default n
+ depends on CRYPTO
+ help
+ This module implements the Plan 9 style capability device.
+
+ To compile this driver as a module, choose
+ M here: the module will be called p9auth.
diff --git a/drivers/char/p9auth/Makefile b/drivers/char/p9auth/Makefile
new file mode 100644
index 0000000..3ebf6ff
--- /dev/null
+++ b/drivers/char/p9auth/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_PLAN9AUTH) += p9auth.o
diff --git a/drivers/char/p9auth/p9auth.c b/drivers/char/p9auth/p9auth.c
new file mode 100644
index 0000000..d14f709
--- /dev/null
+++ b/drivers/char/p9auth/p9auth.c
@@ -0,0 +1,517 @@
+/*
+ * Plan 9 style capability device implementation for the Linux Kernel
+ *
+ * Copyright 2008, 2009 Ashwin Ganti <[email protected]>
+ *
+ * Released under the GPLv2
+ *
+ */
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/moduleparam.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/errno.h>
+#include <linux/fcntl.h>
+#include <linux/cdev.h>
+#include <linux/uaccess.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/string.h>
+#include <linux/crypto.h>
+#include <linux/highmem.h>
+#include <linux/scatterlist.h>
+#include <linux/sched.h>
+#include <linux/cred.h>
+#include <linux/user_namespace.h>
+
+#ifndef CAP_MAJOR
+#define CAP_MAJOR 0
+#endif
+
+#ifndef CAP_NR_DEVS
+#define CAP_NR_DEVS 2 /* caphash and capuse */
+#endif
+
+#ifndef CAP_NODE_SIZE
+#define CAP_NODE_SIZE 20
+#endif
+
+#define MAX_DIGEST_SIZE 20
+
+struct cap_node {
+ char data[CAP_NODE_SIZE];
+ struct user_namespace *user_ns;
+ unsigned long time_created;
+ struct list_head list;
+};
+
+#define CAP_HASH_COUNT_LIM 4000 /* make configurable sometime */
+/*
+ * cap_list, the list of valid capability tokens
+ * todo: put into user_namespace
+ */
+static LIST_HEAD(cap_list);
+static int cap_hash_count; /* number of entries cap_list */
+DEFINE_MUTEX(cap_mutex); /* TODO fix up the locking one day */
+
+struct cap_dev {
+ struct cdev cdev;
+};
+
+static int cap_major = CAP_MAJOR;
+static int cap_minor;
+
+module_param(cap_major, int, S_IRUGO);
+module_param(cap_minor, int, S_IRUGO);
+
+MODULE_AUTHOR("Ashwin Ganti");
+MODULE_LICENSE("GPL");
+
+static struct cap_dev *cap_devices;
+
+static void hexdump(unsigned char *buf, unsigned int len)
+{
+ while (len--)
+ printk(KERN_DEBUG "%02x", *buf++);
+ printk(KERN_DEBUG "\n");
+}
+
+static char *cap_hash(char *plain_text, unsigned int plain_text_size,
+ char *key, unsigned int key_size)
+{
+ struct scatterlist sg;
+ char *result;
+ struct crypto_hash *tfm;
+ struct hash_desc desc;
+ int ret;
+
+ tfm = crypto_alloc_hash("hmac(sha1)", 0, CRYPTO_ALG_ASYNC);
+ if (IS_ERR(tfm)) {
+ printk(KERN_ERR
+ "failed to load transform for hmac(sha1): %ld\n",
+ PTR_ERR(tfm));
+ return NULL;
+ }
+
+ desc.tfm = tfm;
+ desc.flags = 0;
+
+ result = kzalloc(MAX_DIGEST_SIZE, GFP_KERNEL);
+ if (!result) {
+ printk(KERN_ERR "out of memory!\n");
+ goto out;
+ }
+
+ sg_set_buf(&sg, plain_text, plain_text_size);
+
+ ret = crypto_hash_setkey(tfm, key, key_size);
+ if (ret) {
+ printk(KERN_ERR "setkey() failed ret=%d\n", ret);
+ kfree(result);
+ result = NULL;
+ goto out;
+ }
+
+ ret = crypto_hash_digest(&desc, &sg, plain_text_size, result);
+ if (ret) {
+ printk(KERN_ERR "digest () failed ret=%d\n", ret);
+ kfree(result);
+ result = NULL;
+ goto out;
+ }
+
+ printk(KERN_DEBUG "crypto hash digest size %d\n",
+ crypto_hash_digestsize(tfm));
+ hexdump(result, MAX_DIGEST_SIZE);
+
+out:
+ crypto_free_hash(tfm);
+ return result;
+}
+
+static int cap_open(struct inode *inode, struct file *filp)
+{
+ struct cap_dev *dev;
+ dev = container_of(inode->i_cdev, struct cap_dev, cdev);
+ filp->private_data = dev;
+
+ return 0;
+}
+
+static int cap_release(struct inode *inode, struct file *filp)
+{
+ return 0;
+}
+
+struct id_set {
+ char *source_user, *target_user;
+ uid_t old_uid, new_uid;
+ gid_t new_gid;
+ unsigned int ngroups;
+ struct group_info *newgroups;
+ char *full; /* The full entry which must be freed */
+};
+
+/*
+ * read an entry. For now it is
+ * source_user@target_user@rand
+ * Next it will become
+ * source_user@target_user@target_group@numgroups@grp1..@grpn@rand
+ */
+static int parse_user_capability(char *s, struct id_set *set)
+{
+ char *tmp, *tmpu;
+ int i, ret;
+ unsigned long res;
+
+ /*
+ * break the supplied string into tokens with @ as the
+ * delimiter If the string is "user1@user2@randomstring" we
+ * need to split it and hash 'user1@user2' using 'randomstring'
+ * as the key.
+ */
+ tmpu = set->full = kstrdup(s, GFP_KERNEL);
+ if (!tmpu)
+ return -ENOMEM;
+
+ ret = -EINVAL;
+ set->source_user = strsep(&tmpu, "@");
+ set->target_user = strsep(&tmpu, "@");
+ tmp = strsep(&tmpu, "@");
+ if (!set->source_user || !set->target_user || !tmp)
+ goto out;
+
+ if (strict_strtoul(set->target_user, 0, &res))
+ goto out;
+ set->new_uid = (uid_t) res;
+ if (strict_strtoul(set->source_user, 0, &res))
+ goto out;
+ set->old_uid = (uid_t) res;
+ if (strict_strtoul(tmp, 0, &res))
+ goto out;
+ set->new_gid = (gid_t) res;
+
+ tmp = strsep(&tmpu, "@");
+ if (!tmp)
+ goto out;
+ if (sscanf(tmp, "%d", &set->ngroups) != 1 || set->ngroups < 0)
+ goto out;
+
+ ret = -ENOMEM;
+ set->newgroups = groups_alloc(set->ngroups);
+ if (!set->newgroups)
+ goto out;
+
+ ret = -EINVAL;
+ for (i = 0; i < set->ngroups; i++) {
+ gid_t g;
+
+ tmp = strsep(&tmpu, "@");
+ if (!tmp || sscanf(tmp, "%d", &g) != 1) {
+ groups_free(set->newgroups);
+ goto out;
+ }
+ GROUP_AT(set->newgroups, i) = g;
+ }
+
+ ret = 0;
+
+out:
+ kfree(set->full);
+ return ret;
+}
+
+static int grant_id(struct id_set *set)
+{
+ struct cred *new;
+ int ret;
+
+ /*
+ * Check whether the process writing to capuse
+ * is actually owned by the source owner
+ */
+ if (set->old_uid != current_uid()) {
+ printk(KERN_ALERT
+ "p9auth: process %d may switch from uid %d to %d, "
+ " but is uid %d (denied).\n", current->pid,
+ set->old_uid, set->new_uid, current_uid());
+ return -EFAULT;
+ }
+
+ /*
+ * Change uid, euid, and fsuid. The suid remains for
+ * flexibility - though I'm torn as to the tradeoff of
+ * usefulness vs. danger in that.
+ */
+ new = prepare_creds();
+ if (!new)
+ return -ENOMEM;
+
+ ret = set_groups(new, set->newgroups);
+ if (!ret)
+ ret = cred_setresgid(new, set->new_gid, set->new_gid,
+ set->new_gid, CRED_SETID_FORCE);
+ if (!ret)
+ ret = cred_setresuid(new, set->new_uid, set->new_uid,
+ set->new_uid, CRED_SETID_FORCE);
+ if (ret == 0)
+ commit_creds(new);
+ else
+ abort_creds(new);
+
+ return ret;
+}
+
+/* Delete a capability entry from the list */
+static void del_cap_node(struct cap_node *node)
+{
+ list_del(&node->list);
+ put_user_ns(node->user_ns);
+ kfree(node);
+ cap_hash_count--;
+}
+
+/* Expose this through sysctl eventually? 2 min timeout for hashes */
+static int cap_timeout = 120;
+
+/* Remove unused entries older tha (cap_timeout) seconds */
+static void remove_old_entries(void)
+{
+ struct cap_node *node, *tmp;
+
+ list_for_each_entry_safe(node, tmp, &cap_list, list)
+ if (node->time_created + HZ * cap_timeout < jiffies)
+ del_cap_node(node);
+}
+
+/*
+ * There are CAP_HASH_COUNT_LIM (4k) entries -
+ * trim the 5 oldest even though newer than cap_timeout
+ */
+static void trim_oldest_entries(void)
+{
+ struct cap_node *node, *tmp;
+ int i = 0;
+
+ list_for_each_entry_safe(node, tmp, &cap_list, list) {
+ if (++i > 5)
+ break;
+ del_cap_node(node);
+ }
+}
+
+/*
+ * Add a capability hash entry to the list - called by the
+ * privileged factotum server. Called with cap_mutex held.
+ */
+static int add_caphash_entry(char *user_buf, size_t count)
+{
+ struct cap_node *node_ptr;
+
+ if (count > CAP_NODE_SIZE)
+ return -EINVAL;
+ if (!capable(CAP_GRANT_ID))
+ return -EPERM;
+ node_ptr = kmalloc(sizeof(struct cap_node), GFP_KERNEL);
+ if (!node_ptr)
+ return -ENOMEM;
+
+ printk(KERN_INFO "Capability being written to /dev/caphash :\n");
+ hexdump(user_buf, count);
+ memcpy(node_ptr->data, user_buf, count);
+ node_ptr->user_ns = get_user_ns(current_user_ns());
+ node_ptr->time_created = jiffies;
+ list_add(&(node_ptr->list), &(cap_list));
+ cap_hash_count++;
+ remove_old_entries();
+ if (cap_hash_count > CAP_HASH_COUNT_LIM)
+ trim_oldest_entries();
+
+ return 0;
+}
+
+/*
+ * Use a capability hash entry from the list - called by the
+ * unprivileged login daemon. Called with cap_mutex held.
+ */
+static int use_caphash_entry(char *ubuf)
+{
+ struct cap_node *node;
+ struct id_set set;
+ int ret, found = 0;
+ char *hashed = NULL, *sep;
+ struct list_head *pos;
+
+ if (list_empty(&(cap_list)))
+ return -EINVAL;
+
+ ret = parse_user_capability(ubuf, &set);
+ if (ret)
+ return ret;
+
+ /*
+ * hash the string user1@user2@ngrp@grp... with randstr as the key
+ * XXX is there any vulnerability we're opening ourselves up to by
+ * not rebuilding the string from its components?
+ */
+ sep = strrchr(ubuf, '@');
+ if (sep) {
+ char *rand = sep + 1;
+ *sep = '\0';
+ hashed = cap_hash(ubuf, strlen(ubuf), rand, strlen(rand));
+ }
+ if (NULL == hashed) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ /* Change the process's uid if the hash is present in the
+ * list of hashes
+ */
+ list_for_each(pos, &(cap_list)) {
+ node = list_entry(pos, struct cap_node, list);
+ if (current_user_ns() != node->user_ns)
+ continue;
+ if (0 == memcmp(hashed, node->data, CAP_NODE_SIZE)) {
+ ret = grant_id(&set);
+ if (ret < 0)
+ goto out;
+
+ /* Capability may only be used once */
+ del_cap_node(node);
+ found = 1;
+ break;
+ }
+ }
+ if (!found) {
+ printk(KERN_ALERT
+ "Invalid capabiliy written to /dev/capuse\n");
+ ret = -EFAULT;
+ }
+out:
+ put_group_info(set.newgroups);
+ kfree(hashed);
+ return ret;
+}
+
+static ssize_t cap_write(struct file *filp, const char __user *buf,
+ size_t count, loff_t *f_pos)
+{
+ ssize_t retval = -ENOMEM;
+ char *user_buf;
+
+ if (mutex_lock_interruptible(&cap_mutex))
+ return -EINTR;
+
+ user_buf = kzalloc(count+1, GFP_KERNEL);
+ if (!user_buf)
+ goto out;
+
+ if (copy_from_user(user_buf, buf, count)) {
+ retval = -EFAULT;
+ goto out;
+ }
+
+ /*
+ * If the minor number is 0 ( /dev/caphash ) then simply add the
+ * hashed capability supplied by the user to the list of hashes
+ */
+ if (cap_minor == iminor(filp->f_dentry->d_inode))
+ retval = add_caphash_entry(user_buf, count);
+ else
+ retval = use_caphash_entry(user_buf);
+
+ *f_pos += count;
+ retval = count;
+
+out:
+ kfree(user_buf);
+ mutex_unlock(&cap_mutex);
+ return retval;
+}
+
+static const struct file_operations cap_fops = {
+ .owner = THIS_MODULE,
+ .write = cap_write,
+ .open = cap_open,
+ .release = cap_release,
+};
+
+/* delete all hashed entries (at module exit) */
+static void cap_trim(void)
+{
+ struct cap_node *node, *tmp;
+
+ list_for_each_entry_safe(node, tmp, &cap_list, list)
+ del_cap_node(node);
+}
+
+/* no __exit here because it can be called by the init function */
+static void cap_cleanup_module(void)
+{
+ int i;
+ dev_t devno = MKDEV(cap_major, cap_minor);
+ cap_trim();
+ if (cap_devices) {
+ for (i = 0; i < CAP_NR_DEVS; i++)
+ cdev_del(&cap_devices[i].cdev);
+ kfree(cap_devices);
+ }
+ unregister_chrdev_region(devno, CAP_NR_DEVS);
+
+}
+
+static void cap_setup_cdev(struct cap_dev *dev, int index)
+{
+ int err, devno = MKDEV(cap_major, cap_minor + index);
+ cdev_init(&dev->cdev, &cap_fops);
+ dev->cdev.owner = THIS_MODULE;
+ dev->cdev.ops = &cap_fops;
+ err = cdev_add(&dev->cdev, devno, 1);
+ if (err)
+ printk(KERN_NOTICE "Error %d adding cap%d", err, index);
+}
+
+static int __init cap_init_module(void)
+{
+ int result, i;
+ dev_t dev = 0;
+
+ if (cap_major) {
+ dev = MKDEV(cap_major, cap_minor);
+ result = register_chrdev_region(dev, CAP_NR_DEVS, "cap");
+ } else {
+ result = alloc_chrdev_region(&dev, cap_minor, CAP_NR_DEVS,
+ "cap");
+ cap_major = MAJOR(dev);
+ }
+
+ if (result < 0) {
+ printk(KERN_WARNING "cap: can't get major %d\n",
+ cap_major);
+ return result;
+ }
+
+ cap_devices = kzalloc(CAP_NR_DEVS * sizeof(struct cap_dev),
+ GFP_KERNEL);
+ if (!cap_devices) {
+ result = -ENOMEM;
+ goto fail;
+ }
+
+ /* Initialize each device. */
+ for (i = 0; i < CAP_NR_DEVS; i++)
+ cap_setup_cdev(&cap_devices[i], i);
+
+ return 0;
+
+fail:
+ cap_cleanup_module();
+ return result;
+}
+
+module_init(cap_init_module);
+module_exit(cap_cleanup_module);
+
+
--
1.7.0.4

2010-04-21 03:04:29

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 2/3] p9auth: add CAP_GRANT_ID to authorize use of /dev/caphash

On Tue, Apr 20, 2010 at 08:28:15PM -0500, Serge E. Hallyn wrote:
>
> Granting userid capabilities to another task is a dangerous
> privilege. Don't just let file permissions authorize it.
> Define CAP_GRANT_ID as a new capability needed to write to
> /dev/caphash.

Isn't there some man page that also needs to be updated when you add
something like this to the user/kernel api?

thanks,

greg k-h

2010-04-21 03:04:45

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 3/3] p9auth: add p9auth driver

On Tue, Apr 20, 2010 at 08:29:08PM -0500, Serge E. Hallyn wrote:
> This is a driver that adds Plan 9 style capability device
> implementation. See Documentation/p9auth.txt for a description
> of how to use this.

Hm, you didn't originally write this driver, so it would be good to get
some original authorship information in here to keep everything correct,
right?

> Documentation/p9auth.txt | 47 ++++
> drivers/char/Kconfig | 2 +
> drivers/char/Makefile | 2 +
> drivers/char/p9auth/Kconfig | 9 +
> drivers/char/p9auth/Makefile | 1 +
> drivers/char/p9auth/p9auth.c | 517 ++++++++++++++++++++++++++++++++++++++++++

Is this code really ready for drivers/char/? What has changed in it
that makes it ok to move out of the staging tree?

And who is going to maintain it? You? Or someone else?

> 6 files changed, 578 insertions(+), 0 deletions(-)
> create mode 100644 Documentation/p9auth.txt
> create mode 100644 drivers/char/p9auth/Kconfig
> create mode 100644 drivers/char/p9auth/Makefile
> create mode 100644 drivers/char/p9auth/p9auth.c
>
> diff --git a/Documentation/p9auth.txt b/Documentation/p9auth.txt
> new file mode 100644
> index 0000000..14a69d8
> --- /dev/null
> +++ b/Documentation/p9auth.txt
> @@ -0,0 +1,47 @@
> +The p9auth device driver implements a plan-9 factotum-like
> +capability API. Tasks which are privileged (authorized by
> +possession of the CAP_GRANT_ID privilege (POSIX capability))
> +can write new capabilities to /dev/caphash. The kernel then
> +stores these until a task uses them by writing to the
> +/dev/capuse device. Each capability represents the ability
> +for a task running as userid X to switch to userid Y and
> +some set of groups. Each capability may be used only once,
> +and unused capabilities are cleared after two minutes.
> +
> +The following examples shows how to use the API. Shell 1
> +contains a privileged root shell. Shell 2 contains an
> +unprivileged shell as user 501 in the same user namespace. If
> +not already done, the privileged shell should create the p9auth
> +devices:
> +
> + majfile=/sys/module/p9auth/parameters/cap_major
> + minfile=/sys/module/p9auth/parameters/cap_minor
> + maj=`cat $majfile`
> + mknod /dev/caphash c $maj 0
> + min=`cat $minfile`
> + mknod /dev/capuse c $maj 1
> + chmod ugo+w /dev/capuse

That is incorrect, you don't need the cap_major/minor files at all, the
device node should be automatically created for you, right?

And do you really want to do all of this control through a device node?
Why?

> +DEFINE_MUTEX(cap_mutex); /* TODO fix up the locking one day */

One might hope that today would be that day...

Also, please run this through sparse. This is a variable that doesn't
need to be global.

> +struct cap_dev {
> + struct cdev cdev;
> +};

Do you really need to do it this way? A cdev for a single device node?
That's major overkill.

> +static int cap_major = CAP_MAJOR;
> +static int cap_minor;

Just always use a dynamic misc device, you never need a whole major for
this.

> +module_param(cap_major, int, S_IRUGO);
> +module_param(cap_minor, int, S_IRUGO);

Can be removed.

> +MODULE_AUTHOR("Ashwin Ganti");

Hm, who is going to maintain this, you, or Ashwin?

> +static void hexdump(unsigned char *buf, unsigned int len)
> +{
> + while (len--)
> + printk(KERN_DEBUG "%02x", *buf++);
> + printk(KERN_DEBUG "\n");
> +}

We have a built-in function for this already.

Oh, this function is also incorrect, which is a good reason to use the
built-in ones.

I'm going to stop now, this doesn't belong in drivers/char/ yet, it
needs work...

thanks,

greg k-h

> +/*
> + * read an entry. For now it is
> + * source_user@target_user@rand
> + * Next it will become
> + * source_user@target_user@target_group@numgroups@grp1..@grpn@rand
> + */

Hm, wait, one more.... The kernel/user api is going to change some time
in the future? Please fix this now before it gets merged.

2010-04-21 03:45:38

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 3/3] p9auth: add p9auth driver

Quoting Greg KH ([email protected]):
> On Tue, Apr 20, 2010 at 08:29:08PM -0500, Serge E. Hallyn wrote:
> > This is a driver that adds Plan 9 style capability device
> > implementation. See Documentation/p9auth.txt for a description
> > of how to use this.
>
> Hm, you didn't originally write this driver, so it would be good to get
> some original authorship information in here to keep everything correct,
> right?

That's why I left the MODULE_AUTHOR line in there - not sure what
else to do for that. I'll add a comment in p9auth.txt, especially
pointing back to Ashwin's original paper.

> > Documentation/p9auth.txt | 47 ++++
> > drivers/char/Kconfig | 2 +
> > drivers/char/Makefile | 2 +
> > drivers/char/p9auth/Kconfig | 9 +
> > drivers/char/p9auth/Makefile | 1 +
> > drivers/char/p9auth/p9auth.c | 517 ++++++++++++++++++++++++++++++++++++++++++
>
> Is this code really ready for drivers/char/? What has changed in it
> that makes it ok to move out of the staging tree?

It was dropped from staging :) I don't particularly care to see it
go back into staging, as opposed to working out issues out of tree
(assuming they are solvable). For one thing, as you note below,
there is the question of whether it should be a device driver at
all.

> And who is going to maintain it? You? Or someone else?

If Ashwin doesn't want to maintain it, I'll do it. Either way.

> > 6 files changed, 578 insertions(+), 0 deletions(-)
> > create mode 100644 Documentation/p9auth.txt
> > create mode 100644 drivers/char/p9auth/Kconfig
> > create mode 100644 drivers/char/p9auth/Makefile
> > create mode 100644 drivers/char/p9auth/p9auth.c
> >
> > diff --git a/Documentation/p9auth.txt b/Documentation/p9auth.txt
> > new file mode 100644
> > index 0000000..14a69d8
> > --- /dev/null
> > +++ b/Documentation/p9auth.txt
> > @@ -0,0 +1,47 @@
> > +The p9auth device driver implements a plan-9 factotum-like
> > +capability API. Tasks which are privileged (authorized by
> > +possession of the CAP_GRANT_ID privilege (POSIX capability))
> > +can write new capabilities to /dev/caphash. The kernel then
> > +stores these until a task uses them by writing to the
> > +/dev/capuse device. Each capability represents the ability
> > +for a task running as userid X to switch to userid Y and
> > +some set of groups. Each capability may be used only once,
> > +and unused capabilities are cleared after two minutes.
> > +
> > +The following examples shows how to use the API. Shell 1
> > +contains a privileged root shell. Shell 2 contains an
> > +unprivileged shell as user 501 in the same user namespace. If
> > +not already done, the privileged shell should create the p9auth
> > +devices:
> > +
> > + majfile=/sys/module/p9auth/parameters/cap_major
> > + minfile=/sys/module/p9auth/parameters/cap_minor
> > + maj=`cat $majfile`
> > + mknod /dev/caphash c $maj 0
> > + min=`cat $minfile`
> > + mknod /dev/capuse c $maj 1
> > + chmod ugo+w /dev/capuse
>
> That is incorrect, you don't need the cap_major/minor files at all, the
> device node should be automatically created for you, right?

Hmm, where? Not in /dev on my SLES11 partition...

> And do you really want to do all of this control through a device node?
> Why?

Well...

At first I was thinking same as you were. So I was going to switch
to a pure syscall-based approach. But it just turned out more
complicated. The factotum server would call sys_grantid(), and
the target task would end up doing some huge sys_setresugid() or
else multiple syscalls using the granted id. It just was uglier.
I think there's an experimental patchset sitting somewhere I could
point to (if I weren't embarassed :).

Another possibility would be to use netlink, but that doesn't
appear as amenable to segragation by user namespaces. The pid
(presumably/hopefully global pid, as __u32) is available, so it
shouldn't be impossible, but a simple device with simple synchronous
read/write certainly has its appeal. Firing off a message hoping
that at some point our credentials will be changes, less so.

> > +DEFINE_MUTEX(cap_mutex); /* TODO fix up the locking one day */
>
> One might hope that today would be that day...

Well there's nothing actually wrong with the cap_mutex. I put in
this comment thinking I'd switch to rcu at some point, but it doesn't
seem worthwhile at the moment, by a long shot.

> Also, please run this through sparse. This is a variable that doesn't
> need to be global.
>
> > +struct cap_dev {
> > + struct cdev cdev;
> > +};
>
> Do you really need to do it this way? A cdev for a single device node?
> That's major overkill.
>
> > +static int cap_major = CAP_MAJOR;
> > +static int cap_minor;
>
> Just always use a dynamic misc device, you never need a whole major for
> this.

Right - will switch that over, assuming we don't nix the whole idea
first.

> > +module_param(cap_major, int, S_IRUGO);
> > +module_param(cap_minor, int, S_IRUGO);
>
> Can be removed.
>
> > +MODULE_AUTHOR("Ashwin Ganti");
>
> Hm, who is going to maintain this, you, or Ashwin?

I haven't asked Ashwin, and will be happy either way. Ashwin, did
you want to maintain it?

> > +static void hexdump(unsigned char *buf, unsigned int len)
> > +{
> > + while (len--)
> > + printk(KERN_DEBUG "%02x", *buf++);
> > + printk(KERN_DEBUG "\n");
> > +}
>
> We have a built-in function for this already.
>
> Oh, this function is also incorrect, which is a good reason to use the
> built-in ones.

Really I prefer to get rid of that altogether. Anyone who wants
to see that can use a kprobe hooked to the exit of cap_hash().

> I'm going to stop now, this doesn't belong in drivers/char/ yet, it
> needs work...

/me rolls up his sleeves.

> thanks,
>
> greg k-h
>
> > +/*
> > + * read an entry. For now it is
> > + * source_user@target_user@rand
> > + * Next it will become
> > + * source_user@target_user@target_group@numgroups@grp1..@grpn@rand
> > + */
>
> Hm, wait, one more.... The kernel/user api is going to change some time
> in the future? Please fix this now before it gets merged.

Huh, no that change already happened.

thanks,
-serge

2010-04-21 04:18:56

by Ashwin Ganti

[permalink] [raw]
Subject: Re: [PATCH 3/3] p9auth: add p9auth driver

On Tue, Apr 20, 2010 at 8:45 PM, Serge E. Hallyn <[email protected]> wrote:
> Quoting Greg KH ([email protected]):
>> Hm, who is going to maintain this, you, or Ashwin?
>
> I haven't asked Ashwin, and will be happy either way. ?Ashwin, did
> you want to maintain it?

Well, since you have been driving a lot of implementation changes
recently, I feel it makes sense for you to be a maintainer. Honestly I
don't think I will be able to find time maintaining this in the future
although I can do the code reviews for you. I will be more than happy
if this feature continues to get traction.

- Ashwin

2010-04-21 04:45:17

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 3/3] p9auth: add p9auth driver

"Serge E. Hallyn" <[email protected]> writes:

> Quoting Greg KH ([email protected]):
>> On Tue, Apr 20, 2010 at 08:29:08PM -0500, Serge E. Hallyn wrote:
>> > This is a driver that adds Plan 9 style capability device
>> > implementation. See Documentation/p9auth.txt for a description
>> > of how to use this.
>>
>> Hm, you didn't originally write this driver, so it would be good to get
>> some original authorship information in here to keep everything correct,
>> right?
>
> That's why I left the MODULE_AUTHOR line in there - not sure what
> else to do for that. I'll add a comment in p9auth.txt, especially
> pointing back to Ashwin's original paper.
>
>> > Documentation/p9auth.txt | 47 ++++
>> > drivers/char/Kconfig | 2 +
>> > drivers/char/Makefile | 2 +
>> > drivers/char/p9auth/Kconfig | 9 +
>> > drivers/char/p9auth/Makefile | 1 +
>> > drivers/char/p9auth/p9auth.c | 517 ++++++++++++++++++++++++++++++++++++++++++
>>
>> Is this code really ready for drivers/char/? What has changed in it
>> that makes it ok to move out of the staging tree?
>
> It was dropped from staging :) I don't particularly care to see it
> go back into staging, as opposed to working out issues out of tree
> (assuming they are solvable). For one thing, as you note below,
> there is the question of whether it should be a device driver at
> all.
>
>> And who is going to maintain it? You? Or someone else?
>
> If Ashwin doesn't want to maintain it, I'll do it. Either way.
>
>> > 6 files changed, 578 insertions(+), 0 deletions(-)
>> > create mode 100644 Documentation/p9auth.txt
>> > create mode 100644 drivers/char/p9auth/Kconfig
>> > create mode 100644 drivers/char/p9auth/Makefile
>> > create mode 100644 drivers/char/p9auth/p9auth.c
>> >
>> > diff --git a/Documentation/p9auth.txt b/Documentation/p9auth.txt
>> > new file mode 100644
>> > index 0000000..14a69d8
>> > --- /dev/null
>> > +++ b/Documentation/p9auth.txt
>> > @@ -0,0 +1,47 @@
>> > +The p9auth device driver implements a plan-9 factotum-like
>> > +capability API. Tasks which are privileged (authorized by
>> > +possession of the CAP_GRANT_ID privilege (POSIX capability))
>> > +can write new capabilities to /dev/caphash. The kernel then
>> > +stores these until a task uses them by writing to the
>> > +/dev/capuse device. Each capability represents the ability
>> > +for a task running as userid X to switch to userid Y and
>> > +some set of groups. Each capability may be used only once,
>> > +and unused capabilities are cleared after two minutes.
>> > +
>> > +The following examples shows how to use the API. Shell 1
>> > +contains a privileged root shell. Shell 2 contains an
>> > +unprivileged shell as user 501 in the same user namespace. If
>> > +not already done, the privileged shell should create the p9auth
>> > +devices:
>> > +
>> > + majfile=/sys/module/p9auth/parameters/cap_major
>> > + minfile=/sys/module/p9auth/parameters/cap_minor
>> > + maj=`cat $majfile`
>> > + mknod /dev/caphash c $maj 0
>> > + min=`cat $minfile`
>> > + mknod /dev/capuse c $maj 1
>> > + chmod ugo+w /dev/capuse
>>
>> That is incorrect, you don't need the cap_major/minor files at all, the
>> device node should be automatically created for you, right?
>
> Hmm, where? Not in /dev on my SLES11 partition...
>
>> And do you really want to do all of this control through a device node?
>> Why?
>
> Well...
>
> At first I was thinking same as you were. So I was going to switch
> to a pure syscall-based approach. But it just turned out more
> complicated. The factotum server would call sys_grantid(), and
> the target task would end up doing some huge sys_setresugid() or
> else multiple syscalls using the granted id. It just was uglier.
> I think there's an experimental patchset sitting somewhere I could
> point to (if I weren't embarassed :).
>
> Another possibility would be to use netlink, but that doesn't
> appear as amenable to segragation by user namespaces. The pid
> (presumably/hopefully global pid, as __u32) is available, so it
> shouldn't be impossible, but a simple device with simple synchronous
> read/write certainly has its appeal. Firing off a message hoping
> that at some point our credentials will be changes, less so.

pid in the netlink context is the netlink port-id. It is a very
different concept from struct pid. These days netlink calls to
the kernel are synchronous, not that I would encourage netlink
for anything except networking code.

Can we make this a trivial filesystem? I expect that would match
up better with whatever plan9 userspace apps already exist,
remove the inode double translation, and would make it much more
reasonable to do a user namespace aware version if and when
that becomes necessary.

Eric

2010-04-21 09:25:29

by Alan

[permalink] [raw]
Subject: Re: [PATCH 3/3] p9auth: add p9auth driver

> This is a change which must be discussed. The use of this
> privilege can be completely prevented by having init remove
> CAP_GRANT_ID from its capability bounding set before forking any
> processes.

Which is a minor back compat issue - but you could start without it and
allow init to add it.

It seems a very complex interface to do a simple thing. A long time ago
there was discussion around extending the AF_UNIX fd passing to permit
'pass handle and auth' so you could send someone a handle with a "become
me" token attached.

Alan

2010-04-21 10:48:14

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 1/3] p9auth: split core function out of some set*{u,g}id functions

Serge E. Hallyn <[email protected]> wrote:

> +int cred_setresuid(struct cred *new, uid_t ruid, uid_t euid, uid_t suid,
> + int force);
> +int cred_setresgid(struct cred *new, gid_t rgid, gid_t egid, gid_t sgid,
> + int force);
> +int cred_setfsuid(struct cred *new, uid_t uid, uid_t *old_fsuid);
> +int cred_setfsgid(struct cred *new, gid_t gid, gid_t *old_fsgid);

Can you mark these extern please?

Other than that, these functions should probably be mentioned in
Documentation/credentials.txt.

David

2010-04-21 10:50:21

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 3/3] p9auth: add p9auth driver

Serge E. Hallyn <[email protected]> wrote:

> + if (ret == 0)
> + commit_creds(new);
> + else
> + abort_creds(new);
> +
> + return ret;

If you make this:

if (ret == 0)
return commit_creds(new);
abort_creds(new);
return ret;

then gcc can tail-call commit_creds(), which is guaranteed to return 0.

David

2010-04-21 13:22:12

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 3/3] p9auth: add p9auth driver

Quoting Eric W. Biederman ([email protected]):
> Can we make this a trivial filesystem? I expect that would match
> up better with whatever plan9 userspace apps already exist,
> remove the inode double translation, and would make it much more
> reasonable to do a user namespace aware version if and when
> that becomes necessary.

Great idea, and so obvious once you mention it. I think that's
the way to go. Thanks, Eric.

-serge

2010-04-21 13:39:42

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 3/3] p9auth: add p9auth driver

Quoting Alan Cox ([email protected]):
> > This is a change which must be discussed. The use of this
> > privilege can be completely prevented by having init remove
> > CAP_GRANT_ID from its capability bounding set before forking any
> > processes.
>
> Which is a minor back compat issue - but you could start without it and
> allow init to add it.
>
> It seems a very complex interface to do a simple thing. A long time ago
> there was discussion around extending the AF_UNIX fd passing to permit
> 'pass handle and auth' so you could send someone a handle with a "become
> me" token attached.
>
> Alan

Hi Alan,

sorry I thought I had cc:d you, bc I was pretty sure you'd have some
neat ideas. Like this one.

One could try to argue that this makes every linux process susceptible
to a trojan making it grant its userid to other tasks, but of course
that's silly since the trojan could just fork. Well, what this would
buy the attacker is the ability to sit inconspicuously under his old
userid, holding on to the fd until the admin goes out to coffee before
switching userids.

The other thing is that offhand I think the server can't easily tell
from the socket which user namespace the client is in, as ucred only
has .uid. Though (1) we might need to create a 'struct puser' analogous
to 'struct pid' for signals anyway, (2) userspace can segragate with
fs or net_ns (if abstract sock), and (3) client in a container
presumably won't be able to authenticate itself to server on the
host anyway.

Ashwin (and Ron), I think this idea will give us the same tools that
the p9auth driver does, perhaps in a more unix-y way. Would you have
objections, or do you see shortcomings?

Thanks, Alan.

-serge

2010-04-21 13:40:48

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 3/3] p9auth: add p9auth driver

Quoting David Howells ([email protected]):
> Serge E. Hallyn <[email protected]> wrote:
>
> > + if (ret == 0)
> > + commit_creds(new);
> > + else
> > + abort_creds(new);
> > +
> > + return ret;
>
> If you make this:
>
> if (ret == 0)
> return commit_creds(new);
> abort_creds(new);
> return ret;
>
> then gcc can tail-call commit_creds(), which is guaranteed to return 0.
>
> David

Ok, will do.

thanks,
-serge

2010-04-21 13:40:58

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 1/3] p9auth: split core function out of some set*{u,g}id functions

Quoting David Howells ([email protected]):
> Serge E. Hallyn <[email protected]> wrote:
>
> > +int cred_setresuid(struct cred *new, uid_t ruid, uid_t euid, uid_t suid,
> > + int force);
> > +int cred_setresgid(struct cred *new, gid_t rgid, gid_t egid, gid_t sgid,
> > + int force);
> > +int cred_setfsuid(struct cred *new, uid_t uid, uid_t *old_fsuid);
> > +int cred_setfsgid(struct cred *new, gid_t gid, gid_t *old_fsgid);
>
> Can you mark these extern please?
>
> Other than that, these functions should probably be mentioned in
> Documentation/credentials.txt.
>
> David

Will do.

thanks,
-serge

2010-04-21 13:48:18

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 3/3] p9auth: add p9auth driver

Quoting Ashwin Ganti ([email protected]):
> On Tue, Apr 20, 2010 at 8:45 PM, Serge E. Hallyn <[email protected]> wrote:
> > Quoting Greg KH ([email protected]):
> >> Hm, who is going to maintain this, you, or Ashwin?
> >
> > I haven't asked Ashwin, and will be happy either way. ?Ashwin, did
> > you want to maintain it?
>
> Well, since you have been driving a lot of implementation changes
> recently, I feel it makes sense for you to be a maintainer. Honestly I
> don't think I will be able to find time maintaining this in the future
> although I can do the code reviews for you. I will be more than happy
> if this feature continues to get traction.
>
> - Ashwin

Ok. I'll put myself in MAINTAINERS. If we stick with the p9auth
driver or filesystem approach, I'd like to keep you in MODULE_AUTHOR()
if that's ok, bc it really is yours, and wouldn't exist without you.

thanks,
-serge

2010-04-21 13:56:56

by Eric Paris

[permalink] [raw]
Subject: Re: [PATCH 3/3] p9auth: add p9auth driver

On Wed, 2010-04-21 at 10:27 +0100, Alan Cox wrote:
> > This is a change which must be discussed. The use of this
> > privilege can be completely prevented by having init remove
> > CAP_GRANT_ID from its capability bounding set before forking any
> > processes.
>
> Which is a minor back compat issue - but you could start without it and
> allow init to add it.
>
> It seems a very complex interface to do a simple thing. A long time ago
> there was discussion around extending the AF_UNIX fd passing to permit
> 'pass handle and auth' so you could send someone a handle with a "become
> me" token attached.

If you do go down this path there is a separate (and actually completely
opposite) but related problem I might be able and willing to work with
you on. When looking at how auditing works in this modern day and age
of dbus+polkit to get background processes to do work on behalf of a
user we were discussing an interface that would pass the information
about the user to the background server process. The background server
process could do some magic such that it still had all the permissions
and rights of itself, but had the audit information of the original
user. Thus even though it was a server process with uid=0 that did the
work, the audit logs could know it was actually on behalf of uid=500.

It was discussed passing that token of audit information over an AF_UNIX
socket.

-Eric

2010-04-21 14:15:31

by Alan

[permalink] [raw]
Subject: Re: [PATCH 3/3] p9auth: add p9auth driver

> sorry I thought I had cc:d you, bc I was pretty sure you'd have some
> neat ideas. Like this one.
>
> One could try to argue that this makes every linux process susceptible
> to a trojan making it grant its userid to other tasks, but of course
> that's silly since the trojan could just fork. Well, what this would
> buy the attacker is the ability to sit inconspicuously under his old
> userid, holding on to the fd until the admin goes out to coffee before
> switching userids.

Possibly, could you put a timestamp in the passed object ? (Given the
expiry of such a uid offer is kind of policy)

> The other thing is that offhand I think the server can't easily tell
> from the socket which user namespace the client is in, as ucred only
> has .uid. Though (1) we might need to create a 'struct puser' analogous
> to 'struct pid' for signals anyway, (2) userspace can segragate with
> fs or net_ns (if abstract sock), and (3) client in a container
> presumably won't be able to authenticate itself to server on the
> host anyway.

That I think needs fixing anyway and now is a good time as any to do it.
If you are going to be able to pass userids then you need to be sure you
don't get passed a handle with a uid change on it that you didn't want so
adding another object type and option of some kind to accept them has to
occur anyway.

That also btw needs fixing for other reasons - more than one daemon has
been written that generically uses recvmsg and so can be attacked with FD
leaks >-)

Alan

2010-04-21 14:30:33

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 3/3] p9auth: add p9auth driver

Quoting Eric Paris ([email protected]):
> On Wed, 2010-04-21 at 10:27 +0100, Alan Cox wrote:
> > > This is a change which must be discussed. The use of this
> > > privilege can be completely prevented by having init remove
> > > CAP_GRANT_ID from its capability bounding set before forking any
> > > processes.
> >
> > Which is a minor back compat issue - but you could start without it and
> > allow init to add it.
> >
> > It seems a very complex interface to do a simple thing. A long time ago
> > there was discussion around extending the AF_UNIX fd passing to permit
> > 'pass handle and auth' so you could send someone a handle with a "become
> > me" token attached.
>
> If you do go down this path there is a separate (and actually completely
> opposite) but related problem I might be able and willing to work with
> you on. When looking at how auditing works in this modern day and age
> of dbus+polkit to get background processes to do work on behalf of a

This actually brings up an issue I've been a bit worried about: is
credentials passing for dbus adequate? I thought that the last time
I looked through some code, there was no way in particular for upstart
to pass posix capabilities info along. What that means is that as root
I can do

capsh --drop=(list of all capabilities) --
reboot

and, although I don't have cap_sys_boot, I can reboot the system. So
the only way I can prevent a container from rebooting the host is to
start it in a fresh network namespace to segrate the abstract unix
domain sockets. But if I don't want a fresh network namespace, I'm out
of luck.

> user we were discussing an interface that would pass the information
> about the user to the background server process. The background server
> process could do some magic such that it still had all the permissions
> and rights of itself, but had the audit information of the original
> user. Thus even though it was a server process with uid=0 that did the
> work, the audit logs could know it was actually on behalf of uid=500.
>
> It was discussed passing that token of audit information over an AF_UNIX
> socket.
>
> -Eric

2010-04-21 14:44:08

by Ashwin Ganti

[permalink] [raw]
Subject: Re: [PATCH 3/3] p9auth: add p9auth driver

On Wed, Apr 21, 2010 at 6:47 AM, Serge E. Hallyn <[email protected]> wrote:
> Quoting Ashwin Ganti ([email protected]):
>> On Tue, Apr 20, 2010 at 8:45 PM, Serge E. Hallyn <[email protected]> wrote:
>> > Quoting Greg KH ([email protected]):
>> >> Hm, who is going to maintain this, you, or Ashwin?
>> >
>> > I haven't asked Ashwin, and will be happy either way. ?Ashwin, did
>> > you want to maintain it?
>>
>> Well, since you have been driving a lot of implementation changes
>> recently, I feel it makes sense for you to be a maintainer. Honestly I
>> don't think I will be able to find time maintaining this in the future
>> although I can do the code reviews for you. I will be more than happy
>> if this feature continues to get traction.
>>
>> - Ashwin
>
> Ok. ?I'll put myself in MAINTAINERS. ?If we stick with the p9auth
> driver or filesystem approach, I'd like to keep you in MODULE_AUTHOR()
> if that's ok, bc it really is yours, and wouldn't exist without you.

Yeah, sure.

- Ashwin

2010-04-21 15:09:18

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 3/3] p9auth: add p9auth driver

Quoting Alan Cox ([email protected]):
> > sorry I thought I had cc:d you, bc I was pretty sure you'd have some
> > neat ideas. Like this one.
> >
> > One could try to argue that this makes every linux process susceptible
> > to a trojan making it grant its userid to other tasks, but of course
> > that's silly since the trojan could just fork. Well, what this would
> > buy the attacker is the ability to sit inconspicuously under his old
> > userid, holding on to the fd until the admin goes out to coffee before
> > switching userids.
>
> Possibly, could you put a timestamp in the passed object ? (Given the
> expiry of such a uid offer is kind of policy)

Yup, in fact I do that for the p9auth device driver anyway, to avoid
letting a user take up all kernel memory with unused setuid tokens.

> > The other thing is that offhand I think the server can't easily tell
> > from the socket which user namespace the client is in, as ucred only
> > has .uid. Though (1) we might need to create a 'struct puser' analogous
> > to 'struct pid' for signals anyway, (2) userspace can segragate with
> > fs or net_ns (if abstract sock), and (3) client in a container
> > presumably won't be able to authenticate itself to server on the
> > host anyway.
>
> That I think needs fixing anyway and now is a good time as any to do it.
> If you are going to be able to pass userids then you need to be sure you
> don't get passed a handle with a uid change on it that you didn't want so
> adding another object type and option of some kind to accept them has to
> occur anyway.

Ignoring namespaces for a moment, I guess we could do something like

struct credentials_pass {
pid_t global_pid;
unsigned long unique_id;
uid_t new_uid;
gid_t new_gid;
int num_aux_gids;
gid_t aux_gids[];
}

The unique_id is assigned by the kernel; the client reads the whole
struct credentials_pass - except global_pid - from the unix sock,
verifies the desired credentials, then does sys_acceptid(unique_id)
to accept.

Again, how to pass user namespace information is not obvious to me.

And it's probably debatable whether we want to also pass info about
posix capabilities and LSM security data.

Still, my main concern with this approach is that it makes the server
more complicated. Since I'm passing along my current credentials, that
means I have to construct just the right credentials, which if using
sock identified by pathname may mean I can no longer write to the
socket, right?

And of course one other important consideration is that the overall
p9auth API has been heavily discussed and documented, whereas
implementing our own entirely new approach is breaking new ground
and therefore a lot scarier.

> That also btw needs fixing for other reasons - more than one daemon has
> been written that generically uses recvmsg and so can be attacked with FD
> leaks >-)

Yup.

(By 'needs fixing' you just mean needs to be done right for this
service? Else I think I'm missing something...)

thanks,
-serge

2010-04-21 19:15:48

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 3/3] p9auth: add p9auth driver

"Serge E. Hallyn" <[email protected]> writes:

> Ignoring namespaces for a moment, I guess we could do something like
>
> struct credentials_pass {
> pid_t global_pid;
> unsigned long unique_id;
> uid_t new_uid;
> gid_t new_gid;
> int num_aux_gids;
> gid_t aux_gids[];
> }

This looks surprising like what I am doing in passing uids and pids
through unix domain sockets.

So if this looks like a direction we want to go it shouldn't be too
difficult.

>> That also btw needs fixing for other reasons - more than one daemon has
>> been written that generically uses recvmsg and so can be attacked with FD
>> leaks >-)
>
> Yup.
>
> (By 'needs fixing' you just mean needs to be done right for this
> service? Else I think I'm missing something...)

Remember my unix domain socket and the patch for converting struct cred
into a new context, from a month or so ago. I think that is what we
are talking about.

Eric

2010-04-21 20:23:46

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 3/3] p9auth: add p9auth driver

Quoting Eric W. Biederman ([email protected]):
> "Serge E. Hallyn" <[email protected]> writes:
>
> > Ignoring namespaces for a moment, I guess we could do something like
> >
> > struct credentials_pass {
> > pid_t global_pid;
> > unsigned long unique_id;
> > uid_t new_uid;
> > gid_t new_gid;
> > int num_aux_gids;
> > gid_t aux_gids[];
> > }
>
> This looks surprising like what I am doing in passing uids and pids
> through unix domain sockets.
>
> So if this looks like a direction we want to go it shouldn't be too
> difficult.
>
> >> That also btw needs fixing for other reasons - more than one daemon has
> >> been written that generically uses recvmsg and so can be attacked with FD
> >> leaks >-)
> >
> > Yup.
> >
> > (By 'needs fixing' you just mean needs to be done right for this
> > service? Else I think I'm missing something...)
>
> Remember my unix domain socket and the patch for converting struct cred
> into a new context, from a month or so ago. I think that is what we
> are talking about.

Zoinks! After some digging I found it in my containers.mbox and at
https://lists.linux-foundation.org/pipermail/containers/2010-March/023405.html
and see you even called me out. Sorry! I see your tree at
http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/ebiederm/linux-2.6.33-nsfd-v5.git;a=summary
and commit "af_unix: Allow SO_PEERCRED to work across namespaces", and
it all looks good. Definately useful for a SO_PASSCRED or somesuch
implementation.

thanks,
-serge

2010-04-22 04:57:26

by Kyle Moffett

[permalink] [raw]
Subject: Re: [PATCH 3/3] p9auth: add p9auth driver

On Wed, Apr 21, 2010 at 15:15, Eric W. Biederman <[email protected]> wrote:
> "Serge E. Hallyn" <[email protected]> writes:
>
>> Ignoring namespaces for a moment, I guess we could do something like
>>
>> struct credentials_pass {
>>       pid_t global_pid;
>>       unsigned long unique_id;
>>       uid_t new_uid;
>>       gid_t new_gid;
>>       int num_aux_gids;
>>       gid_t aux_gids[];
>> }
>
> This looks surprising like what I am doing in passing uids and pids
> through unix domain sockets.
>
> So if this looks like a direction we want to go it shouldn't be too
> difficult.

Hmm... for an alternative idea:

We have this nice "kernel keyring" infrastructure that lets us stuff
arbitrary things into "keys" and grant/revoke them between processes.
What if we created a relatively generic way for processes to package
up privileges (of whatever form) into a "key" that could be granted to
another process (via UNIX-domain socket)? Then the other process
would use a setuid()-ish syscall which would instead apply a specific
key as your credentials, possibly including the audit context and/or
namespaces it came from.

By using the keyring system, such tokens could be kept around across
multiple processes easily (as opposed to FDs), in the same style as a
"sudo" ticket file, for example (even with an expiration time).

Types of credentials you could pass around:
* Capabilities
* Filesystem UID/GID in a particular UID namespace (for FS operations)
* Process UID/GID in a particular UID namespace (for kill(), etc)
* Audit contexts
* SELinux/etc security labels

All of the above could be optionally limited to effectively require a
bprm-secure-style exec() with specific args. So for example, instead
of making "/usr/sbin/passwd" a setuid program, you could make it be an
unprivileged helper. It would connect to a privileged daemon and ask
for a password-change cookie for that particular user. The daemon
would create what is essentially a "delayed exec" key which grants a
specific UID and capabilities when that process performs an execkey().

So as an example, you could rewrite "sudo" as a partially-privileged
daemon and an unprivileged helper. The unpriv helper would send
across a request (optionally including the command and environment)
which would be checked by the daemon. It would then issue a key to
allow the unpriv helper to perform a limited exec.

Another option would be to rewrite network login programs (eg OpenSSH)
to use this for privilege separation. The listening process would get
a non-expiring key to allow it to exec a partially-privileged
password-checking program. If the password-checking program likes the
password it generates a single-use key to pass back to the forked
network process that allows it to exec a program as that user.

Cheers,
Kyle Moffett

2010-04-22 14:36:16

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 3/3] p9auth: add p9auth driver

Quoting Kyle Moffett ([email protected]):
> On Wed, Apr 21, 2010 at 15:15, Eric W. Biederman <[email protected]> wrote:
> > "Serge E. Hallyn" <[email protected]> writes:
> >
> >> Ignoring namespaces for a moment, I guess we could do something like
> >>
> >> struct credentials_pass {
> >> ? ? ? pid_t global_pid;
> >> ? ? ? unsigned long unique_id;
> >> ? ? ? uid_t new_uid;
> >> ? ? ? gid_t new_gid;
> >> ? ? ? int num_aux_gids;
> >> ? ? ? gid_t aux_gids[];
> >> }
> >
> > This looks surprising like what I am doing in passing uids and pids
> > through unix domain sockets.
> >
> > So if this looks like a direction we want to go it shouldn't be too
> > difficult.
>
> Hmm... for an alternative idea:
>
> We have this nice "kernel keyring" infrastructure that lets us stuff
> arbitrary things into "keys" and grant/revoke them between processes.
> What if we created a relatively generic way for processes to package
> up privileges (of whatever form) into a "key" that could be granted to
> another process (via UNIX-domain socket)? Then the other process
> would use a setuid()-ish syscall which would instead apply a specific
> key as your credentials, possibly including the audit context and/or
> namespaces it came from.
>
> By using the keyring system, such tokens could be kept around across
> multiple processes easily (as opposed to FDs), in the same style as a
> "sudo" ticket file, for example (even with an expiration time).
>
> Types of credentials you could pass around:
> * Capabilities
> * Filesystem UID/GID in a particular UID namespace (for FS operations)
> * Process UID/GID in a particular UID namespace (for kill(), etc)
> * Audit contexts
> * SELinux/etc security labels
>
> All of the above could be optionally limited to effectively require a
> bprm-secure-style exec() with specific args. So for example, instead
> of making "/usr/sbin/passwd" a setuid program, you could make it be an
> unprivileged helper. It would connect to a privileged daemon and ask
> for a password-change cookie for that particular user. The daemon
> would create what is essentially a "delayed exec" key which grants a
> specific UID and capabilities when that process performs an execkey().

Hi Kyle,

it's a neat idea in terms of flexibility - but in this case the
flexibility really scares me :)

In particular I don't like the ability to keep these tokens around
for later use, and your example of passing around capabilities is
something we've specifically rejected in the past.

That doesn't mean that it wouldn't be a good idea to take your
general approach, use it very inflexibly, and then slowly, as we
gain experience, add flexibility.

I think we are best off starting with two options:

1. pass audit credentials only
2. pass full credentials

credentials meaning uid/gid, though - and let capabilities be
sorted out by the setuid rules.

> So as an example, you could rewrite "sudo" as a partially-privileged
> daemon and an unprivileged helper. The unpriv helper would send
> across a request (optionally including the command and environment)
> which would be checked by the daemon. It would then issue a key to
> allow the unpriv helper to perform a limited exec.
>
> Another option would be to rewrite network login programs (eg OpenSSH)
> to use this for privilege separation. The listening process would get
> a non-expiring key to allow it to exec a partially-privileged
> password-checking program. If the password-checking program likes the
> password it generates a single-use key to pass back to the forked
> network process that allows it to exec a program as that user.

Note that the last two examples are possible with with the simple
p9auth driver, and are really their motivation. In fact, we only
need a single privileged back-end (factotum) server to server a bunch
of unprivileged front-end servers.

So over the next few days I will incorporate a bunch of the feedback
on the p9auth driver itself, turn it into a filesystem, and resend
it. That's not to say that I don't like the other suggestions - I
want to pursue those as well and we can see where we end up and which
approach we prefer. For a first step I look forward to Eric pushing
the patches to do uid and pid translation for unix sock credentials.

thanks,
-serge

2010-04-24 03:36:22

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 3/3] p9auth: add p9auth driver

Quoting Eric W. Biederman ([email protected]):
> "Serge E. Hallyn" <[email protected]> writes:
>
> > Quoting Greg KH ([email protected]):
> >> On Tue, Apr 20, 2010 at 08:29:08PM -0500, Serge E. Hallyn wrote:
> >> > This is a driver that adds Plan 9 style capability device
> >> > implementation. See Documentation/p9auth.txt for a description
> >> > of how to use this.
> >>
> >> Hm, you didn't originally write this driver, so it would be good to get
> >> some original authorship information in here to keep everything correct,
> >> right?
> >
> > That's why I left the MODULE_AUTHOR line in there - not sure what
> > else to do for that. I'll add a comment in p9auth.txt, especially
> > pointing back to Ashwin's original paper.
> >
> >> > Documentation/p9auth.txt | 47 ++++
> >> > drivers/char/Kconfig | 2 +
> >> > drivers/char/Makefile | 2 +
> >> > drivers/char/p9auth/Kconfig | 9 +
> >> > drivers/char/p9auth/Makefile | 1 +
> >> > drivers/char/p9auth/p9auth.c | 517 ++++++++++++++++++++++++++++++++++++++++++
> >>
> >> Is this code really ready for drivers/char/? What has changed in it
> >> that makes it ok to move out of the staging tree?
> >
> > It was dropped from staging :) I don't particularly care to see it
> > go back into staging, as opposed to working out issues out of tree
> > (assuming they are solvable). For one thing, as you note below,
> > there is the question of whether it should be a device driver at
> > all.
> >
> >> And who is going to maintain it? You? Or someone else?
> >
> > If Ashwin doesn't want to maintain it, I'll do it. Either way.
> >
> >> > 6 files changed, 578 insertions(+), 0 deletions(-)
> >> > create mode 100644 Documentation/p9auth.txt
> >> > create mode 100644 drivers/char/p9auth/Kconfig
> >> > create mode 100644 drivers/char/p9auth/Makefile
> >> > create mode 100644 drivers/char/p9auth/p9auth.c
> >> >
> >> > diff --git a/Documentation/p9auth.txt b/Documentation/p9auth.txt
> >> > new file mode 100644
> >> > index 0000000..14a69d8
> >> > --- /dev/null
> >> > +++ b/Documentation/p9auth.txt
> >> > @@ -0,0 +1,47 @@
> >> > +The p9auth device driver implements a plan-9 factotum-like
> >> > +capability API. Tasks which are privileged (authorized by
> >> > +possession of the CAP_GRANT_ID privilege (POSIX capability))
> >> > +can write new capabilities to /dev/caphash. The kernel then
> >> > +stores these until a task uses them by writing to the
> >> > +/dev/capuse device. Each capability represents the ability
> >> > +for a task running as userid X to switch to userid Y and
> >> > +some set of groups. Each capability may be used only once,
> >> > +and unused capabilities are cleared after two minutes.
> >> > +
> >> > +The following examples shows how to use the API. Shell 1
> >> > +contains a privileged root shell. Shell 2 contains an
> >> > +unprivileged shell as user 501 in the same user namespace. If
> >> > +not already done, the privileged shell should create the p9auth
> >> > +devices:
> >> > +
> >> > + majfile=/sys/module/p9auth/parameters/cap_major
> >> > + minfile=/sys/module/p9auth/parameters/cap_minor
> >> > + maj=`cat $majfile`
> >> > + mknod /dev/caphash c $maj 0
> >> > + min=`cat $minfile`
> >> > + mknod /dev/capuse c $maj 1
> >> > + chmod ugo+w /dev/capuse
> >>
> >> That is incorrect, you don't need the cap_major/minor files at all, the
> >> device node should be automatically created for you, right?
> >
> > Hmm, where? Not in /dev on my SLES11 partition...
> >
> >> And do you really want to do all of this control through a device node?
> >> Why?
> >
> > Well...
> >
> > At first I was thinking same as you were. So I was going to switch
> > to a pure syscall-based approach. But it just turned out more
> > complicated. The factotum server would call sys_grantid(), and
> > the target task would end up doing some huge sys_setresugid() or
> > else multiple syscalls using the granted id. It just was uglier.
> > I think there's an experimental patchset sitting somewhere I could
> > point to (if I weren't embarassed :).
> >
> > Another possibility would be to use netlink, but that doesn't
> > appear as amenable to segragation by user namespaces. The pid
> > (presumably/hopefully global pid, as __u32) is available, so it
> > shouldn't be impossible, but a simple device with simple synchronous
> > read/write certainly has its appeal. Firing off a message hoping
> > that at some point our credentials will be changes, less so.
>
> pid in the netlink context is the netlink port-id. It is a very
> different concept from struct pid. These days netlink calls to
> the kernel are synchronous, not that I would encourage netlink
> for anything except networking code.
>
> Can we make this a trivial filesystem? I expect that would match
> up better with whatever plan9 userspace apps already exist,
> remove the inode double translation, and would make it much more
> reasonable to do a user namespace aware version if and when

BTW, this current version is user namespace aware.

> that becomes necessary.

An fs actually seems overkill for two write-only files for
process-related information. Would these actually be candidates
for new /proc files?

/proc/grantcred - replaces /dev/caphash, for privileged
tasks to tell the kernel about new setuid
capabilities
/proc/self/usecred - replaces /dev/capuse for unprivileged
tasks to make use of a setuid capability

-serge

2010-04-24 16:25:13

by ron minnich

[permalink] [raw]
Subject: Re: [PATCH 3/3] p9auth: add p9auth driver

On Fri, Apr 23, 2010 at 8:36 PM, Serge E. Hallyn <[email protected]> wrote:

> An fs actually seems overkill for two write-only files for
> process-related information. ?Would these actually be candidates
> for new /proc files?
>
> ? ? ? ?/proc/grantcred - replaces /dev/caphash, for privileged
> ? ? ? ? ? ? ? ?tasks to tell the kernel about new setuid
> ? ? ? ? ? ? ? ?capabilities
> ? ? ? ?/proc/self/usecred - replaces /dev/capuse for unprivileged
> ? ? ? ? ? ? ? ?tasks to make use of a setuid capability

An fs is fine.

To relate this to Plan 9, where it all began, might be useful. There's
no equivalent in Plan 9 to Linux/Unix devices of the major/minor
number etc. variety. In-kernel drivers and out-of-kernel servers both
end up providing the services (i.e. file name spaces) that we see in a
Linux file system. So the Plan 9 driver for the capability device
really does match closely in function and interface to a Linux
kernel-based file system.

Hence, making devcap a file system is entirely appropriate, because it
best fits the way it works in Plan 9: a kernel driver that provides
two files.

It's pretty easy to write a Linux VFS anyway, so it makes sense from
that point of view.

Eric, that was a great suggestion.

ron

2010-04-24 18:01:50

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 3/3] p9auth: add p9auth driver

ron minnich <[email protected]> writes:

> On Fri, Apr 23, 2010 at 8:36 PM, Serge E. Hallyn <[email protected]> wrote:
>
>> An fs actually seems overkill for two write-only files for
>> process-related information.  Would these actually be candidates
>> for new /proc files?
>>
>>        /proc/grantcred - replaces /dev/caphash, for privileged
>>                tasks to tell the kernel about new setuid
>>                capabilities
>>        /proc/self/usecred - replaces /dev/capuse for unprivileged
>>                tasks to make use of a setuid capability
>
> An fs is fine.
>
> To relate this to Plan 9, where it all began, might be useful. There's
> no equivalent in Plan 9 to Linux/Unix devices of the major/minor
> number etc. variety. In-kernel drivers and out-of-kernel servers both
> end up providing the services (i.e. file name spaces) that we see in a
> Linux file system. So the Plan 9 driver for the capability device
> really does match closely in function and interface to a Linux
> kernel-based file system.
>
> Hence, making devcap a file system is entirely appropriate, because it
> best fits the way it works in Plan 9: a kernel driver that provides
> two files.
>
> It's pretty easy to write a Linux VFS anyway, so it makes sense from
> that point of view.
>
> Eric, that was a great suggestion.

A fs provides user space policy control of naming. I.e. where the two files go.
That can also be a very big deal. Especially when files are writable.

You have no idea how much I am frustrated by sysfs right now, because
it does not provide userspace policy control and instead mandates a
sometimes inappropriate naming convention.

Eric

2010-04-25 03:27:14

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 3/3] p9auth: add p9auth driver

Quoting Eric W. Biederman ([email protected]):
> ron minnich <[email protected]> writes:
>
> > On Fri, Apr 23, 2010 at 8:36 PM, Serge E. Hallyn <[email protected]> wrote:
> >
> >> An fs actually seems overkill for two write-only files for
> >> process-related information. ??Would these actually be candidates
> >> for new /proc files?
> >>
> >> ?? ?? ?? ??/proc/grantcred - replaces /dev/caphash, for privileged
> >> ?? ?? ?? ?? ?? ?? ?? ??tasks to tell the kernel about new setuid
> >> ?? ?? ?? ?? ?? ?? ?? ??capabilities
> >> ?? ?? ?? ??/proc/self/usecred - replaces /dev/capuse for unprivileged
> >> ?? ?? ?? ?? ?? ?? ?? ??tasks to make use of a setuid capability
> >
> > An fs is fine.
> >
> > To relate this to Plan 9, where it all began, might be useful. There's
> > no equivalent in Plan 9 to Linux/Unix devices of the major/minor
> > number etc. variety. In-kernel drivers and out-of-kernel servers both
> > end up providing the services (i.e. file name spaces) that we see in a
> > Linux file system. So the Plan 9 driver for the capability device
> > really does match closely in function and interface to a Linux
> > kernel-based file system.
> >
> > Hence, making devcap a file system is entirely appropriate, because it
> > best fits the way it works in Plan 9: a kernel driver that provides
> > two files.
> >
> > It's pretty easy to write a Linux VFS anyway, so it makes sense from
> > that point of view.
> >
> > Eric, that was a great suggestion.
>
> A fs provides user space policy control of naming. I.e. where the two files go.
> That can also be a very big deal. Especially when files are writable.
>
> You have no idea how much I am frustrated by sysfs right now, because
> it does not provide userspace policy control and instead mandates a
> sometimes inappropriate naming convention.
>
> Eric

Well I'm not convinced that it's a worthwhile tradeoff for polluting
/proc/filesystems and needing yet another fs mounted in each container,
but a preliminary working version using an fs is at
http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/sergeh/linux-cr.git;a=shortlog;h=refs/heads/p9auth.apr24.2

I'll do some cleanup before sending it out.

Eric, I'd said that the device-based version was namespace-aware, but
that meant that you could on grant and use capabilities in your own
user namespace. I suppose now that it's an fs we can do better
semantics, where each user ns can mount its own p9auth, and anyone
with CAP_GRANT_ID targeted at some user ns (i.e. root in a user_ns
or the creator of a user_ns) can grant ids to that user ns. Though
I'm not sure that's a feature anyone would ever use, and I do like
the simplicity of just having one sb.

-serge