LinuxLists.cc - [RFC PATCH 1/4] Add a user_namespace as creator/owner of uts

2010-12-09 17:20:27

Subject: [RFC PATCH 1/4] Add a user_namespace as creator/owner of uts_namespace

copy_process() handles CLONE_NEWUSER before the rest of the
namespaces. So in the case of clone(CLONE_NEWUSER|CLONE_NEWUTS)
the new uts namespace will have the new user namespace as its
owner. That is what we want, since we want root in that new
userns to be able to have privilege over it.

Signed-off-by: Serge E. Hallyn <[email protected]>
---
include/linux/utsname.h | 3 +++
init/version.c | 2 ++
kernel/nsproxy.c | 3 +++
kernel/user.c | 8 ++++++--
kernel/utsname.c | 4 ++++
5 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/linux/utsname.h b/include/linux/utsname.h
index 69f3997..85171be 100644
--- a/include/linux/utsname.h
+++ b/include/linux/utsname.h
@@ -37,9 +37,12 @@ struct new_utsname {
#include <linux/nsproxy.h>
#include <linux/err.h>

+struct user_namespace;
+
struct uts_namespace {
struct kref kref;
struct new_utsname name;
+ struct user_namespace *user_ns;
};
extern struct uts_namespace init_uts_ns;

diff --git a/init/version.c b/init/version.c
index 79fb8c2c..9eb19fb 100644
--- a/init/version.c
+++ b/init/version.c
@@ -21,6 +21,7 @@ extern int version_string(LINUX_VERSION_CODE);
int version_string(LINUX_VERSION_CODE);
#endif

+extern struct user_namespace init_user_ns;
struct uts_namespace init_uts_ns = {
.kref = {
.refcount = ATOMIC_INIT(2),
@@ -33,6 +34,7 @@ struct uts_namespace init_uts_ns = {
.machine = UTS_MACHINE,
.domainname = UTS_DOMAINNAME,
},
+ .user_ns = &init_user_ns,
};
EXPORT_SYMBOL_GPL(init_uts_ns);

diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index f74e6c0..5a22dcf 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -74,6 +74,9 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
err = PTR_ERR(new_nsp->uts_ns);
goto out_uts;
}
+ put_user_ns(new_nsp->uts_ns->user_ns);
+ new_nsp->uts_ns->user_ns = task_cred_xxx(tsk, user)->user_ns;
+ get_user_ns(new_nsp->uts_ns->user_ns);

new_nsp->ipc_ns = copy_ipcs(flags, tsk->nsproxy->ipc_ns);
if (IS_ERR(new_nsp->ipc_ns)) {
diff --git a/kernel/user.c b/kernel/user.c
index 2c7d8d5..5125681 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -17,9 +17,13 @@
#include <linux/module.h>
#include <linux/user_namespace.h>

+/*
+ * userns count is 1 for root user, 1 for init_uts_ns,
+ * and 1 for... ?
+ */
struct user_namespace init_user_ns = {
.kref = {
- .refcount = ATOMIC_INIT(2),
+ .refcount = ATOMIC_INIT(3),
},
.creator = &root_user,
};
@@ -47,7 +51,7 @@ static struct kmem_cache *uid_cachep;
*/
static DEFINE_SPINLOCK(uidhash_lock);

-/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->creator */
+/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->user_ns */
struct user_struct root_user = {
.__count = ATOMIC_INIT(2),
.processes = ATOMIC_INIT(1),
diff --git a/kernel/utsname.c b/kernel/utsname.c
index 8a82b4b..a7b3a8d 100644
--- a/kernel/utsname.c
+++ b/kernel/utsname.c
@@ -14,6 +14,7 @@
#include <linux/utsname.h>
#include <linux/err.h>
#include <linux/slab.h>
+#include <linux/user_namespace.h>

static struct uts_namespace *create_uts_ns(void)
{
@@ -40,6 +41,8 @@ static struct uts_namespace *clone_uts_ns(struct uts_namespace *old_ns)

down_read(&uts_sem);
memcpy(&ns->name, &old_ns->name, sizeof(ns->name));
+ ns->user_ns = old_ns->user_ns;
+ get_user_ns(ns->user_ns);
up_read(&uts_sem);
return ns;
}
@@ -71,5 +74,6 @@ void free_uts_ns(struct kref *kref)
struct uts_namespace *ns;

ns = container_of(kref, struct uts_namespace, kref);
+ put_user_ns(ns->user_ns);
kfree(ns);
}
--
1.7.2.3

2010-12-09 17:28:42

by Serge E. Hallyn

[permalink] [raw]

Subject: [RFC PATCH 2/4] security: Make capabilities relative to the user namespace.

- Introduce ns_capable to test for a capability in a non-default
user namespace.
- Teach cap_capable to handle capabilities in a non-default
user namespace.

The motivation is to get to the unprivileged creation of new
namespaces. It looks like this gets us 90% of the way there, with
only potential uid confusion issues left.

I still need to handle getting all caps after creation but otherwise I
think I have a good starter patch that achieves all of your goals.

Changelog:
11/05/2010: [serge] add apparmor

Signed-off-by: Eric W. Biederman <[email protected]>
Acked-by: Serge E. Hallyn <[email protected]>
---
include/linux/capability.h | 7 +++++--
include/linux/security.h | 12 +++++++-----
kernel/capability.c | 22 ++++++++++++++++++++--
security/apparmor/lsm.c | 5 +++--
security/commoncap.c | 40 +++++++++++++++++++++++++++++++++-------
security/security.c | 12 ++++++------
security/selinux/hooks.c | 14 +++++++++-----
7 files changed, 83 insertions(+), 29 deletions(-)

diff --git a/include/linux/capability.h b/include/linux/capability.h
index 90012b9..cc3e976 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -541,7 +541,7 @@ extern const kernel_cap_t __cap_init_eff_set;
*
* Note that this does not set PF_SUPERPRIV on the task.
*/
-#define has_capability(t, cap) (security_real_capable((t), (cap)) == 0)
+#define has_capability(t, cap) (security_real_capable((t), &init_user_ns, (cap)) == 0)

/**
* has_capability_noaudit - Determine if a task has a superior capability available (unaudited)
@@ -555,9 +555,12 @@ extern const kernel_cap_t __cap_init_eff_set;
* Note that this does not set PF_SUPERPRIV on the task.
*/
#define has_capability_noaudit(t, cap) \
- (security_real_capable_noaudit((t), (cap)) == 0)
+ (security_real_capable_noaudit((t), &init_user_ns, (cap)) == 0)

+struct user_namespace;
+extern struct user_namespace init_user_ns;
extern int capable(int cap);
+extern int ns_capable(struct user_namespace *ns, int cap);

/* audit system wants to get cap info from files as well */
struct dentry;
diff --git a/include/linux/security.h b/include/linux/security.h
index 39f5b7e..9e05b08 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -46,13 +46,14 @@

struct ctl_table;
struct audit_krule;
+struct user_namespace;

/*
* These functions are in security/capability.c and are used
* as the default capabilities functions
*/
extern int cap_capable(struct task_struct *tsk, const struct cred *cred,
- int cap, int audit);
+ struct user_namespace *ns, int cap, int audit);
extern int cap_settime(struct timespec *ts, struct timezone *tz);
extern int cap_ptrace_access_check(struct task_struct *child, unsigned int mode);
extern int cap_ptrace_traceme(struct task_struct *parent);
@@ -1258,6 +1259,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
* credentials.
* @tsk contains the task_struct for the process.
* @cred contains the credentials to use.
+ * @ns contains the user namespace we want the capability in
* @cap contains the capability <include/linux/capability.h>.
* @audit: Whether to write an audit message or not
* Return 0 if the capability is granted for @tsk.
@@ -1386,7 +1388,7 @@ struct security_operations {
const kernel_cap_t *inheritable,
const kernel_cap_t *permitted);
int (*capable) (struct task_struct *tsk, const struct cred *cred,
- int cap, int audit);
+ struct user_namespace *ns, int cap, int audit);
int (*sysctl) (struct ctl_table *table, int op);
int (*quotactl) (int cmds, int type, int id, struct super_block *sb);
int (*quota_on) (struct dentry *dentry);
@@ -1668,9 +1670,9 @@ int security_capset(struct cred *new, const struct cred *old,
const kernel_cap_t *effective,
const kernel_cap_t *inheritable,
const kernel_cap_t *permitted);
-int security_capable(int cap);
-int security_real_capable(struct task_struct *tsk, int cap);
-int security_real_capable_noaudit(struct task_struct *tsk, int cap);
+int security_capable(struct user_namespace *ns, int cap);
+int security_real_capable(struct task_struct *tsk, struct user_namespace *ns, int cap);
+int security_real_capable_noaudit(struct task_struct *tsk, struct user_namespace *ns, int cap);
int security_sysctl(struct ctl_table *table, int op);
int security_quotactl(int cmds, int type, int id, struct super_block *sb);
int security_quota_on(struct dentry *dentry);
diff --git a/kernel/capability.c b/kernel/capability.c
index 2f05303..744dd6e 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -14,6 +14,7 @@
#include <linux/security.h>
#include <linux/syscalls.h>
#include <linux/pid_namespace.h>
+#include <linux/user_namespace.h>
#include <asm/uaccess.h>

/*
@@ -301,15 +302,32 @@ error:
*/
int capable(int cap)
{
+ return ns_capable(&init_user_ns, cap);
+}
+EXPORT_SYMBOL(capable);
+
+/**
+ * ns_capable - Determine if the current task has a superior capability in effect
+ * @ns: The usernamespace we want the capability in
+ * @cap: The capability to be tested for
+ *
+ * Return true if the current task has the given superior capability currently
+ * available for use, false if not.
+ *
+ * This sets PF_SUPERPRIV on the task if the capability is available on the
+ * assumption that it's about to be used.
+ */
+int ns_capable(struct user_namespace *ns, int cap)
+{
if (unlikely(!cap_valid(cap))) {
printk(KERN_CRIT "capable() called with invalid cap=%u\n", cap);
BUG();
}

- if (security_capable(cap) == 0) {
+ if (security_capable(ns, cap) == 0) {
current->flags |= PF_SUPERPRIV;
return 1;
}
return 0;
}
-EXPORT_SYMBOL(capable);
+EXPORT_SYMBOL(ns_capable);
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index fa778a7..00d227f 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -22,6 +22,7 @@
#include <linux/ctype.h>
#include <linux/sysctl.h>
#include <linux/audit.h>
+#include <linux/user_namespace.h>
#include <net/sock.h>

#include "include/apparmor.h"
@@ -137,11 +138,11 @@ static int apparmor_capget(struct task_struct *target, kernel_cap_t *effective,
}

static int apparmor_capable(struct task_struct *task, const struct cred *cred,
- int cap, int audit)
+ struct user_namespace *ns, int cap, int audit)
{
struct aa_profile *profile;
/* cap_capable returns 0 on success, else -EPERM */
- int error = cap_capable(task, cred, cap, audit);
+ int error = cap_capable(task, cred, ns, cap, audit);
if (!error) {
profile = aa_cred_profile(cred);
if (!unconfined(profile))
diff --git a/security/commoncap.c b/security/commoncap.c
index e58b5d8..dcf2bb4 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -27,6 +27,7 @@
#include <linux/sched.h>
#include <linux/prctl.h>
#include <linux/securebits.h>
+#include <linux/user_namespace.h>

/*
* If a non-root user executes a setuid-root binary in
@@ -68,6 +69,7 @@ EXPORT_SYMBOL(cap_netlink_recv);
* cap_capable - Determine whether a task has a particular effective capability
* @tsk: The task to query
* @cred: The credentials to use
+ * @ns: The user namespace in which we need the capability
* @cap: The capability to check for
* @audit: Whether to write an audit message or not
*
@@ -79,10 +81,32 @@ EXPORT_SYMBOL(cap_netlink_recv);
* cap_has_capability() returns 0 when a task has a capability, but the
* kernel's capable() and has_capability() returns 1 for this case.
*/
-int cap_capable(struct task_struct *tsk, const struct cred *cred, int cap,
- int audit)
+int cap_capable(struct task_struct *tsk, const struct cred *cred,
+ struct user_namespace *targ_ns, int cap, int audit)
{
- return cap_raised(cred->cap_effective, cap) ? 0 : -EPERM;
+ for (;;) {
+ /* Do we have the necessary capabilities? */
+ if (targ_ns == cred->user->user_ns)
+ return cap_raised(cred->cap_effective, cap) ? 0 : -EPERM;
+
+ /* The creator of the user namespace has all caps. */
+ if (targ_ns->creator == cred->user)
+ return 0;
+
+ /* Have we tried all of the parent namespaces? */
+ if (targ_ns == &init_user_ns)
+ return -EPERM;
+
+ /* If you have the capability in a parent user ns you have it
+ * in the over all children user namespaces as well, so see
+ * if this process has the capability in the parent user
+ * namespace.
+ */
+ targ_ns = targ_ns->creator->user_ns;
+ }
+
+ /* We never get here */
+ return -EPERM;
}

/**
@@ -177,7 +201,8 @@ static inline int cap_inh_is_capped(void)
/* they are so limited unless the current task has the CAP_SETPCAP
* capability
*/
- if (cap_capable(current, current_cred(), CAP_SETPCAP,
+ if (cap_capable(current, current_cred(),
+ current_cred()->user->user_ns, CAP_SETPCAP,
SECURITY_CAP_AUDIT) == 0)
return 0;
return 1;
@@ -829,7 +854,8 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
& (new->securebits ^ arg2)) /*[1]*/
|| ((new->securebits & SECURE_ALL_LOCKS & ~arg2)) /*[2]*/
|| (arg2 & ~(SECURE_ALL_LOCKS | SECURE_ALL_BITS)) /*[3]*/
- || (cap_capable(current, current_cred(), CAP_SETPCAP,
+ || (cap_capable(current, current_cred(),
+ current_cred()->user->user_ns, CAP_SETPCAP,
SECURITY_CAP_AUDIT) != 0) /*[4]*/
/*
* [1] no changing of bits that are locked
@@ -894,7 +920,7 @@ int cap_vm_enough_memory(struct mm_struct *mm, long pages)
{
int cap_sys_admin = 0;

- if (cap_capable(current, current_cred(), CAP_SYS_ADMIN,
+ if (cap_capable(current, current_cred(), &init_user_ns, CAP_SYS_ADMIN,
SECURITY_CAP_NOAUDIT) == 0)
cap_sys_admin = 1;
return __vm_enough_memory(mm, pages, cap_sys_admin);
@@ -921,7 +947,7 @@ int cap_file_mmap(struct file *file, unsigned long reqprot,
int ret = 0;

if (addr < dac_mmap_min_addr) {
- ret = cap_capable(current, current_cred(), CAP_SYS_RAWIO,
+ ret = cap_capable(current, current_cred(), &init_user_ns, CAP_SYS_RAWIO,
SECURITY_CAP_AUDIT);
/* set PF_SUPERPRIV if it turns out we allow the low mmap */
if (ret == 0)
diff --git a/security/security.c b/security/security.c
index a774256..a7c1a1f 100644
--- a/security/security.c
+++ b/security/security.c
@@ -172,30 +172,30 @@ int security_capset(struct cred *new, const struct cred *old,
effective, inheritable, permitted);
}

-int security_capable(int cap)
+int security_capable(struct user_namespace *ns, int cap)
{
- return security_ops->capable(current, current_cred(), cap,
+ return security_ops->capable(current, current_cred(), ns, cap,
SECURITY_CAP_AUDIT);
}

-int security_real_capable(struct task_struct *tsk, int cap)
+int security_real_capable(struct task_struct *tsk, struct user_namespace *ns, int cap)
{
const struct cred *cred;
int ret;

cred = get_task_cred(tsk);
- ret = security_ops->capable(tsk, cred, cap, SECURITY_CAP_AUDIT);
+ ret = security_ops->capable(tsk, cred, ns, cap, SECURITY_CAP_AUDIT);
put_cred(cred);
return ret;
}

-int security_real_capable_noaudit(struct task_struct *tsk, int cap)
+int security_real_capable_noaudit(struct task_struct *tsk, struct user_namespace *ns, int cap)
{
const struct cred *cred;
int ret;

cred = get_task_cred(tsk);
- ret = security_ops->capable(tsk, cred, cap, SECURITY_CAP_NOAUDIT);
+ ret = security_ops->capable(tsk, cred, ns, cap, SECURITY_CAP_NOAUDIT);
put_cred(cred);
return ret;
}
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 65fa8bf..b9a6a53 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -77,6 +77,7 @@
#include <linux/mutex.h>
#include <linux/posix-timers.h>
#include <linux/syslog.h>
+#include <linux/user_namespace.h>

#include "avc.h"
#include "objsec.h"
@@ -1423,6 +1424,7 @@ static int current_has_perm(const struct task_struct *tsk,
/* Check whether a task is allowed to use a capability. */
static int task_has_capability(struct task_struct *tsk,
const struct cred *cred,
+ struct user_namespace *ns,
int cap, int audit)
{
struct common_audit_data ad;
@@ -1851,15 +1853,15 @@ static int selinux_capset(struct cred *new, const struct cred *old,
*/

static int selinux_capable(struct task_struct *tsk, const struct cred *cred,
- int cap, int audit)
+ struct user_namespace *ns, int cap, int audit)
{
int rc;

- rc = cap_capable(tsk, cred, cap, audit);
+ rc = cap_capable(tsk, cred, ns, cap, audit);
if (rc)
return rc;

- return task_has_capability(tsk, cred, cap, audit);
+ return task_has_capability(tsk, cred, ns, cap, audit);
}

static int selinux_sysctl_get_sid(ctl_table *table, u16 tclass, u32 *sid)
@@ -2012,7 +2014,8 @@ static int selinux_vm_enough_memory(struct mm_struct *mm, long pages)
{
int rc, cap_sys_admin = 0;

- rc = selinux_capable(current, current_cred(), CAP_SYS_ADMIN,
+ rc = selinux_capable(current, current_cred(),
+ &init_user_ns, CAP_SYS_ADMIN,
SECURITY_CAP_NOAUDIT);
if (rc == 0)
cap_sys_admin = 1;
@@ -2826,7 +2829,8 @@ static int selinux_inode_getsecurity(const struct inode *inode, const char *name
* and lack of permission just means that we fall back to the
* in-core context value, not a denial.
*/
- error = selinux_capable(current, current_cred(), CAP_MAC_ADMIN,
+ error = selinux_capable(current, current_cred(),
+ &init_user_ns, CAP_MAC_ADMIN,
SECURITY_CAP_NOAUDIT);
if (!error)
error = security_sid_to_context_force(isec->sid, &context,
--
1.7.2.3

2010-12-09 17:30:50

by Serge E. Hallyn

[permalink] [raw]

Subject: [RFC PATCH 3/4] allow sethostname in a container

To test this, you can:
1. clone a new user namespace without a new uts namespace.
You can NOT set hostname.
2. clone both a new user and uts namespace. You can set
hostname.

Signed-off-by: Serge E. Hallyn <[email protected]>
---
kernel/sys.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 2745dcd..9b9b03b 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1171,7 +1171,7 @@ SYSCALL_DEFINE2(sethostname, char __user *, name, int, len)
int errno;
char tmp[__NEW_UTS_LEN];

- if (!capable(CAP_SYS_ADMIN))
+ if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN))
return -EPERM;
if (len < 0 || len > __NEW_UTS_LEN)
return -EINVAL;
--
1.7.2.3

2010-12-09 17:32:44

by Serge E. Hallyn

[permalink] [raw]

Subject: [RFC PATCH 4/4] allow killing tasks in your own or child userns

Changelog:
Dec 8: Fixed bug in my check_kill_permission pointed out by
Eric Biederman.

To test:
1. Test killing tasks as usual. No change.
2. Clone a new user namespace without a new pidns.
a. You CAN kill -CONT tasks in your thread group but outside
your user ns.
b. You can NOT otherwise kill tasks outside your user_ns.
c. Inside your new userns, signal semantics are as normal
with respect to userids, CAP_KILL, and thread groups.

Signed-off-by: Serge E. Hallyn <[email protected]>
---
kernel/signal.c | 27 ++++++++++++++++++++++-----
1 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 4e3cff1..677025c 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -635,6 +635,27 @@ static inline bool si_fromuser(const struct siginfo *info)
(!is_si_special(info) && SI_FROMUSER(info));
}

+static inline int kill_ok_by_cred(struct cred *cred, struct cred *tcred)
+{
+ if (cred->user->user_ns != tcred->user->user_ns) {
+ /* userids are not equivalent - either you have the
+ capability to the target user ns or you don't */
+ if (ns_capable(tcred->user->user_ns, CAP_KILL))
+ return 1;
+ return 0;
+ }
+
+ /* same user namespace - usual credentials checks apply */
+ if ((cred->euid ^ tcred->suid) &&
+ (cred->euid ^ tcred->uid) &&
+ (cred->uid ^ tcred->suid) &&
+ (cred->uid ^ tcred->uid) &&
+ !ns_capable(tcred->user->user_ns, CAP_KILL))
+ return 0;
+
+ return 1;
+}
+
/*
* Bad permissions for sending the signal
* - the caller must hold the RCU read lock
@@ -659,11 +680,7 @@ static int check_kill_permission(int sig, struct siginfo *info,
cred = current_cred();
tcred = __task_cred(t);
if (!same_thread_group(current, t) &&
- (cred->euid ^ tcred->suid) &&
- (cred->euid ^ tcred->uid) &&
- (cred->uid ^ tcred->suid) &&
- (cred->uid ^ tcred->uid) &&
- !capable(CAP_KILL)) {
+ !kill_ok_by_cred(cred, tcred)) {
switch (sig) {
case SIGCONT:
sid = task_session(t);
--
1.7.2.3

2010-12-10 11:16:45

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [RFC PATCH 4/4] allow killing tasks in your own or child userns

"Serge E. Hallyn" <[email protected]> writes:

> Changelog:
> Dec 8: Fixed bug in my check_kill_permission pointed out by
> Eric Biederman.
>
> To test:
> 1. Test killing tasks as usual. No change.
> 2. Clone a new user namespace without a new pidns.
> a. You CAN kill -CONT tasks in your thread group but outside
> your user ns.
> b. You can NOT otherwise kill tasks outside your user_ns.
> c. Inside your new userns, signal semantics are as normal
> with respect to userids, CAP_KILL, and thread groups.
>
> Signed-off-by: Serge E. Hallyn <[email protected]>
> ---
> kernel/signal.c | 27 ++++++++++++++++++++++-----
> 1 files changed, 22 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 4e3cff1..677025c 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -635,6 +635,27 @@ static inline bool si_fromuser(const struct siginfo *info)
> (!is_si_special(info) && SI_FROMUSER(info));
> }
>
> +static inline int kill_ok_by_cred(struct cred *cred, struct cred *tcred)
> +{
Nit: You should just pass in the target task here.
Making it abundantly clear where current and tcred come from.
ns_capable implicitly uses current which is a little surprising
when everything else is being passed in, but makes perfect sense
in this context.

> + if (cred->user->user_ns != tcred->user->user_ns) {
> + /* userids are not equivalent - either you have the
> + capability to the target user ns or you don't */
> + if (ns_capable(tcred->user->user_ns, CAP_KILL))
> + return 1;
> + return 0;
> + }
> +
> + /* same user namespace - usual credentials checks apply */
> + if ((cred->euid ^ tcred->suid) &&
> + (cred->euid ^ tcred->uid) &&
> + (cred->uid ^ tcred->suid) &&
> + (cred->uid ^ tcred->uid) &&
> + !ns_capable(tcred->user->user_ns, CAP_KILL))
> + return 0;
> +
> + return 1;
> +}
> +
> /*
> * Bad permissions for sending the signal
> * - the caller must hold the RCU read lock
> @@ -659,11 +680,7 @@ static int check_kill_permission(int sig, struct siginfo *info,
> cred = current_cred();
> tcred = __task_cred(t);
> if (!same_thread_group(current, t) &&
> - (cred->euid ^ tcred->suid) &&
> - (cred->euid ^ tcred->uid) &&
> - (cred->uid ^ tcred->suid) &&
> - (cred->uid ^ tcred->uid) &&
> - !capable(CAP_KILL)) {
> + !kill_ok_by_cred(cred, tcred)) {
> switch (sig) {
> case SIGCONT:
> sid = task_session(t);

2010-12-10 16:02:20

by Serge E. Hallyn

[permalink] [raw]

Subject: Re: [RFC PATCH 4/4] allow killing tasks in your own or child userns

Quoting Eric W. Biederman ([email protected]):
> "Serge E. Hallyn" <[email protected]> writes:
> > +static inline int kill_ok_by_cred(struct cred *cred, struct cred *tcred)
> > +{
> Nit: You should just pass in the target task here.
> Making it abundantly clear where current and tcred come from.
> ns_capable implicitly uses current which is a little surprising
> when everything else is being passed in, but makes perfect sense
> in this context.

Thanks, that makes sense, will do.

If the set seems fine overall, then I'll also look at adding ptrace
controls, and hopefully send the result out next week.

thanks,
-serge

2010-12-11 00:51:27

by Alexey Dobriyan

[permalink] [raw]

Subject: Re: [RFC PATCH 1/4] Add a user_namespace as creator/owner of uts_namespace

On Thu, Dec 09, 2010 at 05:20:27PM +0000, Serge E. Hallyn wrote:
> struct uts_namespace {
> struct kref kref;
> struct new_utsname name;
> + struct user_namespace *user_ns;

You're going to add these to the rest?

2010-12-11 01:29:25

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [RFC PATCH 1/4] Add a user_namespace as creator/owner of uts_namespace

Alexey Dobriyan <[email protected]> writes:

> On Thu, Dec 09, 2010 at 05:20:27PM +0000, Serge E. Hallyn wrote:
>> struct uts_namespace {
>> struct kref kref;
>> struct new_utsname name;
>> + struct user_namespace *user_ns;
>
> You're going to add these to the rest?

That is the idea.

namespaces and other objects of interest will keep a user_ns
pointer referencing the context in which they were created.

I took a quick look and all of the namespaces seem to have capability
checks that we want to allow if you are in the proper context.

What is particularly interesting is that this makes a root user in
something other than the init_user_ns with all capabilities essentially
a normal user because capable(CAP_XXX) becomes capable(&init_user_ns, CAP_XXX).
And as such will always fail except where we have updated the capability
checks.

Eric