LinuxLists.cc - [PATCH 01/10] Add a user_namespace as creator/owner of uts_namespace

2011-02-24 15:01:38

by Serge E. Hallyn

[permalink] [raw]

Subject: [PATCH 01/10] Add a user_namespace as creator/owner of uts_namespace

copy_process() handles CLONE_NEWUSER before the rest of the
namespaces. So in the case of clone(CLONE_NEWUSER|CLONE_NEWUTS)
the new uts namespace will have the new user namespace as its
owner. That is what we want, since we want root in that new
userns to be able to have privilege over it.

Changelog:
Feb 15: don't set uts_ns->user_ns if we didn't create
a new uts_ns.
Feb 23: Move extern init_user_ns declaration from
init/version.c to utsname.h.

Signed-off-by: Serge E. Hallyn <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Acked-by: Daniel Lezcano <[email protected]>
---
include/linux/utsname.h | 4 ++++
init/version.c | 1 +
kernel/nsproxy.c | 5 +++++
kernel/user.c | 8 ++++++--
kernel/utsname.c | 4 ++++
5 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/include/linux/utsname.h b/include/linux/utsname.h
index 69f3997..2c3c0f5 100644
--- a/include/linux/utsname.h
+++ b/include/linux/utsname.h
@@ -37,9 +37,13 @@ struct new_utsname {
#include <linux/nsproxy.h>
#include <linux/err.h>

+struct user_namespace;
+extern struct user_namespace init_user_ns;
+
struct uts_namespace {
struct kref kref;
struct new_utsname name;
+ struct user_namespace *user_ns;
};
extern struct uts_namespace init_uts_ns;

diff --git a/init/version.c b/init/version.c
index adff586..86fe0cc 100644
--- a/init/version.c
+++ b/init/version.c
@@ -33,6 +33,7 @@ struct uts_namespace init_uts_ns = {
.machine = UTS_MACHINE,
.domainname = UTS_DOMAINNAME,
},
+ .user_ns = &init_user_ns,
};
EXPORT_SYMBOL_GPL(init_uts_ns);

diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index f74e6c0..034dc2e 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -74,6 +74,11 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
err = PTR_ERR(new_nsp->uts_ns);
goto out_uts;
}
+ if (new_nsp->uts_ns != tsk->nsproxy->uts_ns) {
+ put_user_ns(new_nsp->uts_ns->user_ns);
+ new_nsp->uts_ns->user_ns = task_cred_xxx(tsk, user)->user_ns;
+ get_user_ns(new_nsp->uts_ns->user_ns);
+ }

new_nsp->ipc_ns = copy_ipcs(flags, tsk->nsproxy->ipc_ns);
if (IS_ERR(new_nsp->ipc_ns)) {
diff --git a/kernel/user.c b/kernel/user.c
index 5c598ca..9e03e9c 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -17,9 +17,13 @@
#include <linux/module.h>
#include <linux/user_namespace.h>

+/*
+ * userns count is 1 for root user, 1 for init_uts_ns,
+ * and 1 for... ?
+ */
struct user_namespace init_user_ns = {
.kref = {
- .refcount = ATOMIC_INIT(2),
+ .refcount = ATOMIC_INIT(3),
},
.creator = &root_user,
};
@@ -47,7 +51,7 @@ static struct kmem_cache *uid_cachep;
*/
static DEFINE_SPINLOCK(uidhash_lock);

-/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->creator */
+/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->user_ns */
struct user_struct root_user = {
.__count = ATOMIC_INIT(2),
.processes = ATOMIC_INIT(1),
diff --git a/kernel/utsname.c b/kernel/utsname.c
index 8a82b4b..a7b3a8d 100644
--- a/kernel/utsname.c
+++ b/kernel/utsname.c
@@ -14,6 +14,7 @@
#include <linux/utsname.h>
#include <linux/err.h>
#include <linux/slab.h>
+#include <linux/user_namespace.h>

static struct uts_namespace *create_uts_ns(void)
{
@@ -40,6 +41,8 @@ static struct uts_namespace *clone_uts_ns(struct uts_namespace *old_ns)

down_read(&uts_sem);
memcpy(&ns->name, &old_ns->name, sizeof(ns->name));
+ ns->user_ns = old_ns->user_ns;
+ get_user_ns(ns->user_ns);
up_read(&uts_sem);
return ns;
}
@@ -71,5 +74,6 @@ void free_uts_ns(struct kref *kref)
struct uts_namespace *ns;

ns = container_of(kref, struct uts_namespace, kref);
+ put_user_ns(ns->user_ns);
kfree(ns);
}
--
1.7.0.4

2011-02-24 15:02:10

by Serge E. Hallyn

[permalink] [raw]

Subject: [PATCH 02/10] security: Make capabilities relative to the user namespace.

- Introduce ns_capable to test for a capability in a non-default
user namespace.
- Teach cap_capable to handle capabilities in a non-default
user namespace.

The motivation is to get to the unprivileged creation of new
namespaces. It looks like this gets us 90% of the way there, with
only potential uid confusion issues left.

I still need to handle getting all caps after creation but otherwise I
think I have a good starter patch that achieves all of your goals.

Changelog:
11/05/2010: [serge] add apparmor
12/14/2010: [serge] fix capabilities to created user namespaces
Without this, if user serge creates a user_ns, he won't have
capabilities to the user_ns he created. THis is because we
were first checking whether his effective caps had the caps
he needed and returning -EPERM if not, and THEN checking whether
he was the creator. Reverse those checks.
12/16/2010: [serge] security_real_capable needs ns argument in !security case
01/11/2011: [serge] add task_ns_capable helper
01/11/2011: [serge] add nsown_capable() helper per Bastian Blank suggestion
02/16/2011: [serge] fix a logic bug: the root user is always creator of
init_user_ns, but should not always have capabilities to
it! Fix the check in cap_capable().
02/21/2011: Add the required user_ns parameter to security_capable,
fixing a compile failure.
02/23/2011: Convert some macros to functions as per akpm comments. Some
couldn't be converted because we can't easily forward-declare
them (they are inline if !SECURITY, extern if SECURITY). Add
a current_user_ns function so we can use it in capability.h
without #including cred.h. Move all forward declarations
together to the top of the #ifdef __KERNEL__ section, and use
kernel-doc format.
02/23/2011: Per dhowells, clean up comment in cap_capable().
02/23/2011: Per akpm, remove unreachable 'return -EPERM' in cap_capable.

(Original written and signed off by Eric; latest, modified version
acked by him)

Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Serge E. Hallyn <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Acked-by: Daniel Lezcano <[email protected]>
---
drivers/pci/pci-sysfs.c | 2 +-
include/linux/capability.h | 36 ++++++++++++++++++++++++++++--------
include/linux/cred.h | 4 +++-
include/linux/security.h | 25 ++++++++++++++-----------
kernel/capability.c | 42 +++++++++++++++++++++++++++++++++++++-----
kernel/cred.c | 5 +++++
security/apparmor/lsm.c | 5 +++--
security/commoncap.c | 38 +++++++++++++++++++++++++++++++-------
security/security.c | 16 ++++++++++------
security/selinux/hooks.c | 14 +++++++++-----
10 files changed, 141 insertions(+), 46 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index ea25e5b..90a6b04 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -369,7 +369,7 @@ pci_read_config(struct file *filp, struct kobject *kobj,
u8 *data = (u8*) buf;

/* Several chips lock up trying to read undefined config space */
- if (security_capable(filp->f_cred, CAP_SYS_ADMIN) == 0) {
+ if (security_capable(&init_user_ns, filp->f_cred, CAP_SYS_ADMIN) == 0) {
size = dev->cfg_size;
} else if (dev->hdr_type == PCI_HEADER_TYPE_CARDBUS) {
size = 128;
diff --git a/include/linux/capability.h b/include/linux/capability.h
index fb16a36..7c9c829 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -368,6 +368,17 @@ struct cpu_vfs_cap_data {

#ifdef __KERNEL__

+struct dentry;
+struct user_namespace;
+
+extern struct user_namespace init_user_ns;
+
+struct user_namespace *current_user_ns(void);
+
+extern const kernel_cap_t __cap_empty_set;
+extern const kernel_cap_t __cap_full_set;
+extern const kernel_cap_t __cap_init_eff_set;
+
/*
* Internal kernel functions only
*/
@@ -530,10 +541,6 @@ static inline kernel_cap_t cap_raise_nfsd_set(const kernel_cap_t a,
cap_intersect(permitted, __cap_nfsd_set));
}

-extern const kernel_cap_t __cap_empty_set;
-extern const kernel_cap_t __cap_full_set;
-extern const kernel_cap_t __cap_init_eff_set;
-
/**
* has_capability - Determine if a task has a superior capability available
* @t: The task in question
@@ -544,7 +551,7 @@ extern const kernel_cap_t __cap_init_eff_set;
*
* Note that this does not set PF_SUPERPRIV on the task.
*/
-#define has_capability(t, cap) (security_real_capable((t), (cap)) == 0)
+#define has_capability(t, cap) (security_real_capable((t), &init_user_ns, (cap)) == 0)

/**
* has_capability_noaudit - Determine if a task has a superior capability available (unaudited)
@@ -558,12 +565,25 @@ extern const kernel_cap_t __cap_init_eff_set;
* Note that this does not set PF_SUPERPRIV on the task.
*/
#define has_capability_noaudit(t, cap) \
- (security_real_capable_noaudit((t), (cap)) == 0)
+ (security_real_capable_noaudit((t), &init_user_ns, (cap)) == 0)

-extern int capable(int cap);
+extern bool capable(int cap);
+extern bool ns_capable(struct user_namespace *ns, int cap);
+extern bool task_ns_capable(struct task_struct *t, int cap);
+
+/**
+ * nsown_capable - Check superior capability to one's own user_ns
+ * @cap: The capability in question
+ *
+ * Return true if the current task has the given superior capability
+ * targeted at its own user namespace.
+ */
+static inline bool nsown_capable(int cap)
+{
+ return ns_capable(current_user_ns(), cap);
+}

/* audit system wants to get cap info from files as well */
-struct dentry;
extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps);

#endif /* __KERNEL__ */
diff --git a/include/linux/cred.h b/include/linux/cred.h
index 4aaeab3..9aeeb0b 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -354,9 +354,11 @@ static inline void put_cred(const struct cred *_cred)
#define current_fsgid() (current_cred_xxx(fsgid))
#define current_cap() (current_cred_xxx(cap_effective))
#define current_user() (current_cred_xxx(user))
-#define current_user_ns() (current_cred_xxx(user)->user_ns)
+#define _current_user_ns() (current_cred_xxx(user)->user_ns)
#define current_security() (current_cred_xxx(security))

+extern struct user_namespace *current_user_ns(void);
+
#define current_uid_gid(_uid, _gid) \
do { \
const struct cred *__cred; \
diff --git a/include/linux/security.h b/include/linux/security.h
index b2b7f97..6bbee08 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -46,13 +46,14 @@

struct ctl_table;
struct audit_krule;
+struct user_namespace;

/*
* These functions are in security/capability.c and are used
* as the default capabilities functions
*/
extern int cap_capable(struct task_struct *tsk, const struct cred *cred,
- int cap, int audit);
+ struct user_namespace *ns, int cap, int audit);
extern int cap_settime(struct timespec *ts, struct timezone *tz);
extern int cap_ptrace_access_check(struct task_struct *child, unsigned int mode);
extern int cap_ptrace_traceme(struct task_struct *parent);
@@ -1254,6 +1255,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
* credentials.
* @tsk contains the task_struct for the process.
* @cred contains the credentials to use.
+ * @ns contains the user namespace we want the capability in
* @cap contains the capability <include/linux/capability.h>.
* @audit: Whether to write an audit message or not
* Return 0 if the capability is granted for @tsk.
@@ -1382,7 +1384,7 @@ struct security_operations {
const kernel_cap_t *inheritable,
const kernel_cap_t *permitted);
int (*capable) (struct task_struct *tsk, const struct cred *cred,
- int cap, int audit);
+ struct user_namespace *ns, int cap, int audit);
int (*sysctl) (struct ctl_table *table, int op);
int (*quotactl) (int cmds, int type, int id, struct super_block *sb);
int (*quota_on) (struct dentry *dentry);
@@ -1662,9 +1664,9 @@ int security_capset(struct cred *new, const struct cred *old,
const kernel_cap_t *effective,
const kernel_cap_t *inheritable,
const kernel_cap_t *permitted);
-int security_capable(const struct cred *cred, int cap);
-int security_real_capable(struct task_struct *tsk, int cap);
-int security_real_capable_noaudit(struct task_struct *tsk, int cap);
+int security_capable(struct user_namespace *ns, const struct cred *cred, int cap);
+int security_real_capable(struct task_struct *tsk, struct user_namespace *ns, int cap);
+int security_real_capable_noaudit(struct task_struct *tsk, struct user_namespace *ns, int cap);
int security_sysctl(struct ctl_table *table, int op);
int security_quotactl(int cmds, int type, int id, struct super_block *sb);
int security_quota_on(struct dentry *dentry);
@@ -1856,28 +1858,29 @@ static inline int security_capset(struct cred *new,
return cap_capset(new, old, effective, inheritable, permitted);
}

-static inline int security_capable(const struct cred *cred, int cap)
+static inline int security_capable(struct user_namespace *ns,
+ const struct cred *cred, int cap)
{
- return cap_capable(current, cred, cap, SECURITY_CAP_AUDIT);
+ return cap_capable(current, cred, ns, cap, SECURITY_CAP_AUDIT);
}

-static inline int security_real_capable(struct task_struct *tsk, int cap)
+static inline int security_real_capable(struct task_struct *tsk, struct user_namespace *ns, int cap)
{
int ret;

rcu_read_lock();
- ret = cap_capable(tsk, __task_cred(tsk), cap, SECURITY_CAP_AUDIT);
+ ret = cap_capable(tsk, __task_cred(tsk), ns, cap, SECURITY_CAP_AUDIT);
rcu_read_unlock();
return ret;
}

static inline
-int security_real_capable_noaudit(struct task_struct *tsk, int cap)
+int security_real_capable_noaudit(struct task_struct *tsk, struct user_namespace *ns, int cap)
{
int ret;

rcu_read_lock();
- ret = cap_capable(tsk, __task_cred(tsk), cap,
+ ret = cap_capable(tsk, __task_cred(tsk), ns, cap,
SECURITY_CAP_NOAUDIT);
rcu_read_unlock();
return ret;
diff --git a/kernel/capability.c b/kernel/capability.c
index 9e9385f..0a3d2c8 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -14,6 +14,7 @@
#include <linux/security.h>
#include <linux/syscalls.h>
#include <linux/pid_namespace.h>
+#include <linux/user_namespace.h>
#include <asm/uaccess.h>

/*
@@ -299,17 +300,48 @@ error:
* This sets PF_SUPERPRIV on the task if the capability is available on the
* assumption that it's about to be used.
*/
-int capable(int cap)
+bool capable(int cap)
+{
+ return ns_capable(&init_user_ns, cap);
+}
+EXPORT_SYMBOL(capable);
+
+/**
+ * ns_capable - Determine if the current task has a superior capability in effect
+ * @ns: The usernamespace we want the capability in
+ * @cap: The capability to be tested for
+ *
+ * Return true if the current task has the given superior capability currently
+ * available for use, false if not.
+ *
+ * This sets PF_SUPERPRIV on the task if the capability is available on the
+ * assumption that it's about to be used.
+ */
+bool ns_capable(struct user_namespace *ns, int cap)
{
if (unlikely(!cap_valid(cap))) {
printk(KERN_CRIT "capable() called with invalid cap=%u\n", cap);
BUG();
}

- if (security_capable(current_cred(), cap) == 0) {
+ if (security_capable(ns, current_cred(), cap) == 0) {
current->flags |= PF_SUPERPRIV;
- return 1;
+ return true;
}
- return 0;
+ return false;
}
-EXPORT_SYMBOL(capable);
+EXPORT_SYMBOL(ns_capable);
+
+/**
+ * task_ns_capable - Determine whether current task has a superior
+ * capability targeted at a specific task's user namespace.
+ * @t: The task whose user namespace is targeted.
+ * @cap: The capability in question.
+ *
+ * Return true if it does, false otherwise.
+ */
+bool task_ns_capable(struct task_struct *t, int cap)
+{
+ return ns_capable(task_cred_xxx(t, user)->user_ns, cap);
+}
+EXPORT_SYMBOL(task_ns_capable);
diff --git a/kernel/cred.c b/kernel/cred.c
index 3a9d6dd..e447fa2 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -741,6 +741,11 @@ int set_create_files_as(struct cred *new, struct inode *inode)
}
EXPORT_SYMBOL(set_create_files_as);

+struct user_namespace *current_user_ns(void)
+{
+ return _current_user_ns();
+}
+
#ifdef CONFIG_DEBUG_CREDENTIALS

bool creds_are_invalid(const struct cred *cred)
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index b7106f1..b37c2cd 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -22,6 +22,7 @@
#include <linux/ctype.h>
#include <linux/sysctl.h>
#include <linux/audit.h>
+#include <linux/user_namespace.h>
#include <net/sock.h>

#include "include/apparmor.h"
@@ -136,11 +137,11 @@ static int apparmor_capget(struct task_struct *target, kernel_cap_t *effective,
}

static int apparmor_capable(struct task_struct *task, const struct cred *cred,
- int cap, int audit)
+ struct user_namespace *ns, int cap, int audit)
{
struct aa_profile *profile;
/* cap_capable returns 0 on success, else -EPERM */
- int error = cap_capable(task, cred, cap, audit);
+ int error = cap_capable(task, cred, ns, cap, audit);
if (!error) {
profile = aa_cred_profile(cred);
if (!unconfined(profile))
diff --git a/security/commoncap.c b/security/commoncap.c
index 64c2ed9..6f4c327 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -27,6 +27,7 @@
#include <linux/sched.h>
#include <linux/prctl.h>
#include <linux/securebits.h>
+#include <linux/user_namespace.h>

/*
* If a non-root user executes a setuid-root binary in
@@ -68,6 +69,7 @@ EXPORT_SYMBOL(cap_netlink_recv);
* cap_capable - Determine whether a task has a particular effective capability
* @tsk: The task to query
* @cred: The credentials to use
+ * @ns: The user namespace in which we need the capability
* @cap: The capability to check for
* @audit: Whether to write an audit message or not
*
@@ -79,10 +81,30 @@ EXPORT_SYMBOL(cap_netlink_recv);
* cap_has_capability() returns 0 when a task has a capability, but the
* kernel's capable() and has_capability() returns 1 for this case.
*/
-int cap_capable(struct task_struct *tsk, const struct cred *cred, int cap,
- int audit)
+int cap_capable(struct task_struct *tsk, const struct cred *cred,
+ struct user_namespace *targ_ns, int cap, int audit)
{
- return cap_raised(cred->cap_effective, cap) ? 0 : -EPERM;
+ for (;;) {
+ /* The creator of the user namespace has all caps. */
+ if (targ_ns != &init_user_ns && targ_ns->creator == cred->user)
+ return 0;
+
+ /* Do we have the necessary capabilities? */
+ if (targ_ns == cred->user->user_ns)
+ return cap_raised(cred->cap_effective, cap) ? 0 : -EPERM;
+
+ /* Have we tried all of the parent namespaces? */
+ if (targ_ns == &init_user_ns)
+ return -EPERM;
+
+ /*
+ *If you have a capability in a parent user ns, then you have
+ * it over all children user namespaces as well.
+ */
+ targ_ns = targ_ns->creator->user_ns;
+ }
+
+ /* We never get here */
}

/**
@@ -177,7 +199,8 @@ static inline int cap_inh_is_capped(void)
/* they are so limited unless the current task has the CAP_SETPCAP
* capability
*/
- if (cap_capable(current, current_cred(), CAP_SETPCAP,
+ if (cap_capable(current, current_cred(),
+ current_cred()->user->user_ns, CAP_SETPCAP,
SECURITY_CAP_AUDIT) == 0)
return 0;
return 1;
@@ -829,7 +852,8 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
& (new->securebits ^ arg2)) /*[1]*/
|| ((new->securebits & SECURE_ALL_LOCKS & ~arg2)) /*[2]*/
|| (arg2 & ~(SECURE_ALL_LOCKS | SECURE_ALL_BITS)) /*[3]*/
- || (cap_capable(current, current_cred(), CAP_SETPCAP,
+ || (cap_capable(current, current_cred(),
+ current_cred()->user->user_ns, CAP_SETPCAP,
SECURITY_CAP_AUDIT) != 0) /*[4]*/
/*
* [1] no changing of bits that are locked
@@ -894,7 +918,7 @@ int cap_vm_enough_memory(struct mm_struct *mm, long pages)
{
int cap_sys_admin = 0;

- if (cap_capable(current, current_cred(), CAP_SYS_ADMIN,
+ if (cap_capable(current, current_cred(), &init_user_ns, CAP_SYS_ADMIN,
SECURITY_CAP_NOAUDIT) == 0)
cap_sys_admin = 1;
return __vm_enough_memory(mm, pages, cap_sys_admin);
@@ -921,7 +945,7 @@ int cap_file_mmap(struct file *file, unsigned long reqprot,
int ret = 0;

if (addr < dac_mmap_min_addr) {
- ret = cap_capable(current, current_cred(), CAP_SYS_RAWIO,
+ ret = cap_capable(current, current_cred(), &init_user_ns, CAP_SYS_RAWIO,
SECURITY_CAP_AUDIT);
/* set PF_SUPERPRIV if it turns out we allow the low mmap */
if (ret == 0)
diff --git a/security/security.c b/security/security.c
index 7b7308a..7a6a0d0 100644
--- a/security/security.c
+++ b/security/security.c
@@ -154,29 +154,33 @@ int security_capset(struct cred *new, const struct cred *old,
effective, inheritable, permitted);
}

-int security_capable(const struct cred *cred, int cap)
+int security_capable(struct user_namespace *ns, const struct cred *cred,
+ int cap)
{
- return security_ops->capable(current, cred, cap, SECURITY_CAP_AUDIT);
+ return security_ops->capable(current, cred, ns, cap,
+ SECURITY_CAP_AUDIT);
}

-int security_real_capable(struct task_struct *tsk, int cap)
+int security_real_capable(struct task_struct *tsk, struct user_namespace *ns,
+ int cap)
{
const struct cred *cred;
int ret;

cred = get_task_cred(tsk);
- ret = security_ops->capable(tsk, cred, cap, SECURITY_CAP_AUDIT);
+ ret = security_ops->capable(tsk, cred, ns, cap, SECURITY_CAP_AUDIT);
put_cred(cred);
return ret;
}

-int security_real_capable_noaudit(struct task_struct *tsk, int cap)
+int security_real_capable_noaudit(struct task_struct *tsk,
+ struct user_namespace *ns, int cap)
{
const struct cred *cred;
int ret;

cred = get_task_cred(tsk);
- ret = security_ops->capable(tsk, cred, cap, SECURITY_CAP_NOAUDIT);
+ ret = security_ops->capable(tsk, cred, ns, cap, SECURITY_CAP_NOAUDIT);
put_cred(cred);
return ret;
}
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index c8d6992..6dcda48 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -77,6 +77,7 @@
#include <linux/mutex.h>
#include <linux/posix-timers.h>
#include <linux/syslog.h>
+#include <linux/user_namespace.h>

#include "avc.h"
#include "objsec.h"
@@ -1423,6 +1424,7 @@ static int current_has_perm(const struct task_struct *tsk,
/* Check whether a task is allowed to use a capability. */
static int task_has_capability(struct task_struct *tsk,
const struct cred *cred,
+ struct user_namespace *ns,
int cap, int audit)
{
struct common_audit_data ad;
@@ -1851,15 +1853,15 @@ static int selinux_capset(struct cred *new, const struct cred *old,
*/

static int selinux_capable(struct task_struct *tsk, const struct cred *cred,
- int cap, int audit)
+ struct user_namespace *ns, int cap, int audit)
{
int rc;

- rc = cap_capable(tsk, cred, cap, audit);
+ rc = cap_capable(tsk, cred, ns, cap, audit);
if (rc)
return rc;

- return task_has_capability(tsk, cred, cap, audit);
+ return task_has_capability(tsk, cred, ns, cap, audit);
}

static int selinux_sysctl_get_sid(ctl_table *table, u16 tclass, u32 *sid)
@@ -2012,7 +2014,8 @@ static int selinux_vm_enough_memory(struct mm_struct *mm, long pages)
{
int rc, cap_sys_admin = 0;

- rc = selinux_capable(current, current_cred(), CAP_SYS_ADMIN,
+ rc = selinux_capable(current, current_cred(),
+ &init_user_ns, CAP_SYS_ADMIN,
SECURITY_CAP_NOAUDIT);
if (rc == 0)
cap_sys_admin = 1;
@@ -2829,7 +2832,8 @@ static int selinux_inode_getsecurity(const struct inode *inode, const char *name
* and lack of permission just means that we fall back to the
* in-core context value, not a denial.
*/
- error = selinux_capable(current, current_cred(), CAP_MAC_ADMIN,
+ error = selinux_capable(current, current_cred(),
+ &init_user_ns, CAP_MAC_ADMIN,
SECURITY_CAP_NOAUDIT);
if (!error)
error = security_sid_to_context_force(isec->sid, &context,
--
1.7.0.4

2011-02-24 15:02:16

by Serge E. Hallyn

[permalink] [raw]

Subject: [PATCH 03/10] allow sethostname in a container

Changelog:
Feb 23: let clone_uts_ns() handle setting uts->user_ns
To do so we need to pass in the task_struct who'll
get the utsname, so we can get its user_ns.
Feb 23: As per Oleg's coment, just pass in tsk, instead of two
of its members.

Signed-off-by: Serge E. Hallyn <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Acked-by: Daniel Lezcano <[email protected]>
---
include/linux/utsname.h | 6 +++---
kernel/nsproxy.c | 7 +------
kernel/sys.c | 2 +-
kernel/utsname.c | 12 +++++++-----
4 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/include/linux/utsname.h b/include/linux/utsname.h
index 2c3c0f5..4e5b021 100644
--- a/include/linux/utsname.h
+++ b/include/linux/utsname.h
@@ -54,7 +54,7 @@ static inline void get_uts_ns(struct uts_namespace *ns)
}

extern struct uts_namespace *copy_utsname(unsigned long flags,
- struct uts_namespace *ns);
+ struct task_struct *tsk);
extern void free_uts_ns(struct kref *kref);

static inline void put_uts_ns(struct uts_namespace *ns)
@@ -71,12 +71,12 @@ static inline void put_uts_ns(struct uts_namespace *ns)
}

static inline struct uts_namespace *copy_utsname(unsigned long flags,
- struct uts_namespace *ns)
+ struct task_struct *tsk)
{
if (flags & CLONE_NEWUTS)
return ERR_PTR(-EINVAL);

- return ns;
+ return tsk->nsproxy->uts_ns;
}
#endif

diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 034dc2e..b97fc9d 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -69,16 +69,11 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
goto out_ns;
}

- new_nsp->uts_ns = copy_utsname(flags, tsk->nsproxy->uts_ns);
+ new_nsp->uts_ns = copy_utsname(flags, tsk);
if (IS_ERR(new_nsp->uts_ns)) {
err = PTR_ERR(new_nsp->uts_ns);
goto out_uts;
}
- if (new_nsp->uts_ns != tsk->nsproxy->uts_ns) {
- put_user_ns(new_nsp->uts_ns->user_ns);
- new_nsp->uts_ns->user_ns = task_cred_xxx(tsk, user)->user_ns;
- get_user_ns(new_nsp->uts_ns->user_ns);
- }

new_nsp->ipc_ns = copy_ipcs(flags, tsk->nsproxy->ipc_ns);
if (IS_ERR(new_nsp->ipc_ns)) {
diff --git a/kernel/sys.c b/kernel/sys.c
index 18da702..7a1bbad 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1177,7 +1177,7 @@ SYSCALL_DEFINE2(sethostname, char __user *, name, int, len)
int errno;
char tmp[__NEW_UTS_LEN];

- if (!capable(CAP_SYS_ADMIN))
+ if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN))
return -EPERM;
if (len < 0 || len > __NEW_UTS_LEN)
return -EINVAL;
diff --git a/kernel/utsname.c b/kernel/utsname.c
index a7b3a8d..4464617 100644
--- a/kernel/utsname.c
+++ b/kernel/utsname.c
@@ -31,7 +31,8 @@ static struct uts_namespace *create_uts_ns(void)
* @old_ns: namespace to clone
* Return NULL on error (failure to kmalloc), new ns otherwise
*/
-static struct uts_namespace *clone_uts_ns(struct uts_namespace *old_ns)
+static struct uts_namespace *clone_uts_ns(struct task_struct *tsk,
+ struct uts_namespace *old_ns)
{
struct uts_namespace *ns;

@@ -41,8 +42,7 @@ static struct uts_namespace *clone_uts_ns(struct uts_namespace *old_ns)

down_read(&uts_sem);
memcpy(&ns->name, &old_ns->name, sizeof(ns->name));
- ns->user_ns = old_ns->user_ns;
- get_user_ns(ns->user_ns);
+ ns->user_ns = get_user_ns(task_cred_xxx(tsk, user)->user_ns);
up_read(&uts_sem);
return ns;
}
@@ -53,8 +53,10 @@ static struct uts_namespace *clone_uts_ns(struct uts_namespace *old_ns)
* utsname of this process won't be seen by parent, and vice
* versa.
*/
-struct uts_namespace *copy_utsname(unsigned long flags, struct uts_namespace *old_ns)
+struct uts_namespace *copy_utsname(unsigned long flags,
+ struct task_struct *tsk)
{
+ struct uts_namespace *old_ns = tsk->nsproxy->uts_ns;
struct uts_namespace *new_ns;

BUG_ON(!old_ns);
@@ -63,7 +65,7 @@ struct uts_namespace *copy_utsname(unsigned long flags, struct uts_namespace *ol
if (!(flags & CLONE_NEWUTS))
return old_ns;

- new_ns = clone_uts_ns(old_ns);
+ new_ns = clone_uts_ns(tsk, old_ns);

put_uts_ns(old_ns);
return new_ns;
--
1.7.0.4

2011-02-24 15:02:24

by Serge E. Hallyn

[permalink] [raw]

Subject: [PATCH 04/10] allow killing tasks in your own or child userns

Changelog:
Dec 8: Fixed bug in my check_kill_permission pointed out by
Eric Biederman.
Dec 13: Apply Eric's suggestion to pass target task into kill_ok_by_cred()
for clarity
Dec 31: address comment by Eric Biederman:
don't need cred/tcred in check_kill_permission.
Jan 1: use const cred struct.
Jan 11: Per Bastian Blank's advice, clean up kill_ok_by_cred().
Feb 16: kill_ok_by_cred: fix bad parentheses
Feb 23: per akpm, let compiler inline kill_ok_by_cred

Signed-off-by: Serge E. Hallyn <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Acked-by: Daniel Lezcano <[email protected]>
---
kernel/signal.c | 30 ++++++++++++++++++++++--------
1 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 4e3cff1..12702b4 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -636,13 +636,33 @@ static inline bool si_fromuser(const struct siginfo *info)
}

/*
+ * called with RCU read lock from check_kill_permission()
+ */
+static int kill_ok_by_cred(struct task_struct *t)
+{
+ const struct cred *cred = current_cred();
+ const struct cred *tcred = __task_cred(t);
+
+ if (cred->user->user_ns == tcred->user->user_ns &&
+ (cred->euid == tcred->suid ||
+ cred->euid == tcred->uid ||
+ cred->uid == tcred->suid ||
+ cred->uid == tcred->uid))
+ return 1;
+
+ if (ns_capable(tcred->user->user_ns, CAP_KILL))
+ return 1;
+
+ return 0;
+}
+
+/*
* Bad permissions for sending the signal
* - the caller must hold the RCU read lock
*/
static int check_kill_permission(int sig, struct siginfo *info,
struct task_struct *t)
{
- const struct cred *cred, *tcred;
struct pid *sid;
int error;

@@ -656,14 +676,8 @@ static int check_kill_permission(int sig, struct siginfo *info,
if (error)
return error;

- cred = current_cred();
- tcred = __task_cred(t);
if (!same_thread_group(current, t) &&
- (cred->euid ^ tcred->suid) &&
- (cred->euid ^ tcred->uid) &&
- (cred->uid ^ tcred->suid) &&
- (cred->uid ^ tcred->uid) &&
- !capable(CAP_KILL)) {
+ !kill_ok_by_cred(t)) {
switch (sig) {
case SIGCONT:
sid = task_session(t);
--
1.7.0.4

2011-02-24 15:02:39

by Serge E. Hallyn

[permalink] [raw]

Subject: [PATCH 06/10] user namespaces: convert all capable checks in kernel/sys.c

This allows setuid/setgid in containers. It also fixes some
corner cases where kernel logic foregoes capability checks when
uids are equivalent. The latter will need to be done throughout
the whole kernel.

Changelog:
Jan 11: Use nsown_capable() as suggested by Bastian Blank.
Jan 11: Fix logic errors in uid checks pointed out by Bastian.
Feb 15: allow prlimit to current (was regression in previous version)
Feb 23: remove debugging printks, uninline set_one_prio_perm and
make it bool, and document its return value.

Signed-off-by: Serge E. Hallyn <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Acked-by: Daniel Lezcano <[email protected]>
---
kernel/sys.c | 75 +++++++++++++++++++++++++++++++++++++--------------------
1 files changed, 49 insertions(+), 26 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 7a1bbad..ba2b473 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -119,16 +119,33 @@ EXPORT_SYMBOL(cad_pid);
void (*pm_power_off_prepare)(void);

/*
+ * Returns true if current's euid is same as p's uid or euid,
+ * or has CAP_SYS_NICE to p's user_ns.
+ *
+ * Called with rcu_read_lock, creds are safe
+ */
+static bool set_one_prio_perm(struct task_struct *p)
+{
+ const struct cred *cred = current_cred(), *pcred = __task_cred(p);
+
+ if (pcred->user->user_ns == cred->user->user_ns &&
+ (pcred->uid == cred->euid ||
+ pcred->euid == cred->euid))
+ return true;
+ if (ns_capable(pcred->user->user_ns, CAP_SYS_NICE))
+ return true;
+ return false;
+}
+
+/*
* set the priority of a task
* - the caller must hold the RCU read lock
*/
static int set_one_prio(struct task_struct *p, int niceval, int error)
{
- const struct cred *cred = current_cred(), *pcred = __task_cred(p);
int no_nice;

- if (pcred->uid != cred->euid &&
- pcred->euid != cred->euid && !capable(CAP_SYS_NICE)) {
+ if (!set_one_prio_perm(p)) {
error = -EPERM;
goto out;
}
@@ -502,7 +519,7 @@ SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
if (rgid != (gid_t) -1) {
if (old->gid == rgid ||
old->egid == rgid ||
- capable(CAP_SETGID))
+ nsown_capable(CAP_SETGID))
new->gid = rgid;
else
goto error;
@@ -511,7 +528,7 @@ SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
if (old->gid == egid ||
old->egid == egid ||
old->sgid == egid ||
- capable(CAP_SETGID))
+ nsown_capable(CAP_SETGID))
new->egid = egid;
else
goto error;
@@ -546,7 +563,7 @@ SYSCALL_DEFINE1(setgid, gid_t, gid)
old = current_cred();

retval = -EPERM;
- if (capable(CAP_SETGID))
+ if (nsown_capable(CAP_SETGID))
new->gid = new->egid = new->sgid = new->fsgid = gid;
else if (gid == old->gid || gid == old->sgid)
new->egid = new->fsgid = gid;
@@ -613,7 +630,7 @@ SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
new->uid = ruid;
if (old->uid != ruid &&
old->euid != ruid &&
- !capable(CAP_SETUID))
+ !nsown_capable(CAP_SETUID))
goto error;
}

@@ -622,7 +639,7 @@ SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
if (old->uid != euid &&
old->euid != euid &&
old->suid != euid &&
- !capable(CAP_SETUID))
+ !nsown_capable(CAP_SETUID))
goto error;
}

@@ -670,7 +687,7 @@ SYSCALL_DEFINE1(setuid, uid_t, uid)
old = current_cred();

retval = -EPERM;
- if (capable(CAP_SETUID)) {
+ if (nsown_capable(CAP_SETUID)) {
new->suid = new->uid = uid;
if (uid != old->uid) {
retval = set_user(new);
@@ -712,7 +729,7 @@ SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid)
old = current_cred();

retval = -EPERM;
- if (!capable(CAP_SETUID)) {
+ if (!nsown_capable(CAP_SETUID)) {
if (ruid != (uid_t) -1 && ruid != old->uid &&
ruid != old->euid && ruid != old->suid)
goto error;
@@ -776,7 +793,7 @@ SYSCALL_DEFINE3(setresgid, gid_t, rgid, gid_t, egid, gid_t, sgid)
old = current_cred();

retval = -EPERM;
- if (!capable(CAP_SETGID)) {
+ if (!nsown_capable(CAP_SETGID)) {
if (rgid != (gid_t) -1 && rgid != old->gid &&
rgid != old->egid && rgid != old->sgid)
goto error;
@@ -836,7 +853,7 @@ SYSCALL_DEFINE1(setfsuid, uid_t, uid)

if (uid == old->uid || uid == old->euid ||
uid == old->suid || uid == old->fsuid ||
- capable(CAP_SETUID)) {
+ nsown_capable(CAP_SETUID)) {
if (uid != old_fsuid) {
new->fsuid = uid;
if (security_task_fix_setuid(new, old, LSM_SETID_FS) == 0)
@@ -869,7 +886,7 @@ SYSCALL_DEFINE1(setfsgid, gid_t, gid)

if (gid == old->gid || gid == old->egid ||
gid == old->sgid || gid == old->fsgid ||
- capable(CAP_SETGID)) {
+ nsown_capable(CAP_SETGID)) {
if (gid != old_fsgid) {
new->fsgid = gid;
goto change_okay;
@@ -1179,6 +1196,7 @@ SYSCALL_DEFINE2(sethostname, char __user *, name, int, len)

if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN))
return -EPERM;
+
if (len < 0 || len > __NEW_UTS_LEN)
return -EINVAL;
down_write(&uts_sem);
@@ -1226,7 +1244,7 @@ SYSCALL_DEFINE2(setdomainname, char __user *, name, int, len)
int errno;
char tmp[__NEW_UTS_LEN];

- if (!capable(CAP_SYS_ADMIN))
+ if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN))
return -EPERM;
if (len < 0 || len > __NEW_UTS_LEN)
return -EINVAL;
@@ -1341,6 +1359,8 @@ int do_prlimit(struct task_struct *tsk, unsigned int resource,
rlim = tsk->signal->rlim + resource;
task_lock(tsk->group_leader);
if (new_rlim) {
+ /* Keep the capable check against init_user_ns until
+ cgroups can contain all limits */
if (new_rlim->rlim_max > rlim->rlim_max &&
!capable(CAP_SYS_RESOURCE))
retval = -EPERM;
@@ -1384,19 +1404,22 @@ static int check_prlimit_permission(struct task_struct *task)
{
const struct cred *cred = current_cred(), *tcred;

- tcred = __task_cred(task);
- if (current != task &&
- (cred->uid != tcred->euid ||
- cred->uid != tcred->suid ||
- cred->uid != tcred->uid ||
- cred->gid != tcred->egid ||
- cred->gid != tcred->sgid ||
- cred->gid != tcred->gid) &&
- !capable(CAP_SYS_RESOURCE)) {
- return -EPERM;
- }
+ if (current == task)
+ return 0;

- return 0;
+ tcred = __task_cred(task);
+ if (cred->user->user_ns == tcred->user->user_ns &&
+ (cred->uid == tcred->euid &&
+ cred->uid == tcred->suid &&
+ cred->uid == tcred->uid &&
+ cred->gid == tcred->egid &&
+ cred->gid == tcred->sgid &&
+ cred->gid == tcred->gid))
+ return 0;
+ if (ns_capable(tcred->user->user_ns, CAP_SYS_RESOURCE))
+ return 0;
+
+ return -EPERM;
}

SYSCALL_DEFINE4(prlimit64, pid_t, pid, unsigned int, resource,
--
1.7.0.4

2011-02-24 15:02:32

by Serge E. Hallyn

[permalink] [raw]

Subject: [PATCH 05/10] Allow ptrace from non-init user namespaces

ptrace is allowed to tasks in the same user namespace according to
the usual rules (i.e. the same rules as for two tasks in the init
user namespace). ptrace is also allowed to a user namespace to
which the current task the has CAP_SYS_PTRACE capability.

Changelog:
Dec 31: Address feedback by Eric:
. Correct ptrace uid check
. Rename may_ptrace_ns to ptrace_capable
. Also fix the cap_ptrace checks.
Jan 1: Use const cred struct
Jan 11: use task_ns_capable() in place of ptrace_capable().
Feb 23: same_or_ancestore_user_ns() was not an appropriate
check to constrain cap_issubset. Rather, cap_issubset()
only is meaningful when both capsets are in the same
user_ns.

Signed-off-by: Serge E. Hallyn <[email protected]>
Acked-by: Daniel Lezcano <[email protected]>
---
include/linux/capability.h | 2 ++
kernel/ptrace.c | 27 +++++++++++++++------------
security/commoncap.c | 40 ++++++++++++++++++++++++++++++++--------
3 files changed, 49 insertions(+), 20 deletions(-)

diff --git a/include/linux/capability.h b/include/linux/capability.h
index 7c9c829..2ec4a8c 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -553,6 +553,8 @@ static inline kernel_cap_t cap_raise_nfsd_set(const kernel_cap_t a,
*/
#define has_capability(t, cap) (security_real_capable((t), &init_user_ns, (cap)) == 0)

+#define has_ns_capability(t, ns, cap) (security_real_capable((t), (ns), (cap)) == 0)
+
/**
* has_capability_noaudit - Determine if a task has a superior capability available (unaudited)
* @t: The task in question
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 1708b1e..cde4655 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -134,21 +134,24 @@ int __ptrace_may_access(struct task_struct *task, unsigned int mode)
return 0;
rcu_read_lock();
tcred = __task_cred(task);
- if ((cred->uid != tcred->euid ||
- cred->uid != tcred->suid ||
- cred->uid != tcred->uid ||
- cred->gid != tcred->egid ||
- cred->gid != tcred->sgid ||
- cred->gid != tcred->gid) &&
- !capable(CAP_SYS_PTRACE)) {
- rcu_read_unlock();
- return -EPERM;
- }
+ if (cred->user->user_ns == tcred->user->user_ns &&
+ (cred->uid == tcred->euid &&
+ cred->uid == tcred->suid &&
+ cred->uid == tcred->uid &&
+ cred->gid == tcred->egid &&
+ cred->gid == tcred->sgid &&
+ cred->gid == tcred->gid))
+ goto ok;
+ if (ns_capable(tcred->user->user_ns, CAP_SYS_PTRACE))
+ goto ok;
+ rcu_read_unlock();
+ return -EPERM;
+ok:
rcu_read_unlock();
smp_rmb();
if (task->mm)
dumpable = get_dumpable(task->mm);
- if (!dumpable && !capable(CAP_SYS_PTRACE))
+ if (!dumpable && !task_ns_capable(task, CAP_SYS_PTRACE))
return -EPERM;

return security_ptrace_access_check(task, mode);
@@ -198,7 +201,7 @@ int ptrace_attach(struct task_struct *task)
goto unlock_tasklist;

task->ptrace = PT_PTRACED;
- if (capable(CAP_SYS_PTRACE))
+ if (task_ns_capable(task, CAP_SYS_PTRACE))
task->ptrace |= PT_PTRACE_CAP;

__ptrace_link(task, current);
diff --git a/security/commoncap.c b/security/commoncap.c
index 6f4c327..5700ba5 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -128,18 +128,30 @@ int cap_settime(struct timespec *ts, struct timezone *tz)
* @child: The process to be accessed
* @mode: The mode of attachment.
*
+ * If we are in the same or an ancestor user_ns and have all the target
+ * task's capabilities, then ptrace access is allowed.
+ * If we have the ptrace capability to the target user_ns, then ptrace
+ * access is allowed.
+ * Else denied.
+ *
* Determine whether a process may access another, returning 0 if permission
* granted, -ve if denied.
*/
int cap_ptrace_access_check(struct task_struct *child, unsigned int mode)
{
int ret = 0;
+ const struct cred *cred, *child_cred;

rcu_read_lock();
- if (!cap_issubset(__task_cred(child)->cap_permitted,
- current_cred()->cap_permitted) &&
- !capable(CAP_SYS_PTRACE))
- ret = -EPERM;
+ cred = current_cred();
+ child_cred = __task_cred(child);
+ if (cred->user->user_ns == child_cred->user->user_ns &&
+ cap_issubset(child_cred->cap_permitted, cred->cap_permitted))
+ goto out;
+ if (ns_capable(child_cred->user->user_ns, CAP_SYS_PTRACE))
+ goto out;
+ ret = -EPERM;
+out:
rcu_read_unlock();
return ret;
}
@@ -148,18 +160,30 @@ int cap_ptrace_access_check(struct task_struct *child, unsigned int mode)
* cap_ptrace_traceme - Determine whether another process may trace the current
* @parent: The task proposed to be the tracer
*
+ * If parent is in the same or an ancestor user_ns and has all current's
+ * capabilities, then ptrace access is allowed.
+ * If parent has the ptrace capability to current's user_ns, then ptrace
+ * access is allowed.
+ * Else denied.
+ *
* Determine whether the nominated task is permitted to trace the current
* process, returning 0 if permission is granted, -ve if denied.
*/
int cap_ptrace_traceme(struct task_struct *parent)
{
int ret = 0;
+ const struct cred *cred, *child_cred;

rcu_read_lock();
- if (!cap_issubset(current_cred()->cap_permitted,
- __task_cred(parent)->cap_permitted) &&
- !has_capability(parent, CAP_SYS_PTRACE))
- ret = -EPERM;
+ cred = __task_cred(parent);
+ child_cred = current_cred();
+ if (cred->user->user_ns == child_cred->user->user_ns &&
+ cap_issubset(child_cred->cap_permitted, cred->cap_permitted))
+ goto out;
+ if (has_ns_capability(parent, child_cred->user->user_ns, CAP_SYS_PTRACE))
+ goto out;
+ ret = -EPERM;
+out:
rcu_read_unlock();
return ret;
}
--
1.7.0.4

2011-02-24 15:02:50

by Serge E. Hallyn

[permalink] [raw]

Subject: [PATCH 07/10] add a user namespace owner of ipc ns

Changelog:
Feb 15: Don't set new ipc->user_ns if we didn't create a new
ipc_ns.
Feb 23: Move extern declaration to ipc_namespace.h, and group
fwd declarations at top.

Signed-off-by: Serge E. Hallyn <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Acked-by: Daniel Lezcano <[email protected]>
---
include/linux/ipc_namespace.h | 4 ++++
ipc/msgutil.c | 1 +
ipc/namespace.c | 9 +++++++--
kernel/nsproxy.c | 5 +++++
4 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index 5195298..c8cdf0e 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -15,6 +15,8 @@

#define IPCNS_CALLBACK_PRI 0

+struct user_namespace;
+extern struct user_namespace init_user_ns;

struct ipc_ids {
int in_use;
@@ -56,6 +58,8 @@ struct ipc_namespace {
unsigned int mq_msg_max; /* initialized to DFLT_MSGMAX */
unsigned int mq_msgsize_max; /* initialized to DFLT_MSGSIZEMAX */

+ /* user_ns which owns the ipc ns */
+ struct user_namespace *user_ns;
};

extern struct ipc_namespace init_ipc_ns;
diff --git a/ipc/msgutil.c b/ipc/msgutil.c
index f095ee2..8b5ce5d 100644
--- a/ipc/msgutil.c
+++ b/ipc/msgutil.c
@@ -32,6 +32,7 @@ struct ipc_namespace init_ipc_ns = {
.mq_msg_max = DFLT_MSGMAX,
.mq_msgsize_max = DFLT_MSGSIZEMAX,
#endif
+ .user_ns = &init_user_ns,
};

atomic_t nr_ipc_ns = ATOMIC_INIT(1);
diff --git a/ipc/namespace.c b/ipc/namespace.c
index a1094ff..aa18899 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -11,10 +11,11 @@
#include <linux/slab.h>
#include <linux/fs.h>
#include <linux/mount.h>
+#include <linux/user_namespace.h>

#include "util.h"

-static struct ipc_namespace *create_ipc_ns(void)
+static struct ipc_namespace *create_ipc_ns(struct ipc_namespace *old_ns)
{
struct ipc_namespace *ns;
int err;
@@ -43,6 +44,9 @@ static struct ipc_namespace *create_ipc_ns(void)
ipcns_notify(IPCNS_CREATED);
register_ipcns_notifier(ns);

+ ns->user_ns = old_ns->user_ns;
+ get_user_ns(ns->user_ns);
+
return ns;
}

@@ -50,7 +54,7 @@ struct ipc_namespace *copy_ipcs(unsigned long flags, struct ipc_namespace *ns)
{
if (!(flags & CLONE_NEWIPC))
return get_ipc_ns(ns);
- return create_ipc_ns();
+ return create_ipc_ns(ns);
}

/*
@@ -105,6 +109,7 @@ static void free_ipc_ns(struct ipc_namespace *ns)
* order to have a correct value when recomputing msgmni.
*/
ipcns_notify(IPCNS_REMOVED);
+ put_user_ns(ns->user_ns);
}

/*
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index b97fc9d..ac8a56e 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -80,6 +80,11 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
err = PTR_ERR(new_nsp->ipc_ns);
goto out_ipc;
}
+ if (new_nsp->ipc_ns != tsk->nsproxy->ipc_ns) {
+ put_user_ns(new_nsp->ipc_ns->user_ns);
+ new_nsp->ipc_ns->user_ns = task_cred_xxx(tsk, user)->user_ns;
+ get_user_ns(new_nsp->ipc_ns->user_ns);
+ }

new_nsp->pid_ns = copy_pid_ns(flags, task_active_pid_ns(tsk));
if (IS_ERR(new_nsp->pid_ns)) {
--
1.7.0.4

2011-02-24 15:02:58

by Serge E. Hallyn

[permalink] [raw]

Subject: [PATCH 08/10] user namespaces: convert several capable() calls

CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(),
because the resource comes from current's own ipc namespace.

setuid/setgid are to uids in own namespace, so again checks can be
against current_user_ns().

Changelog:
Jan 11: Use task_ns_capable() in place of sched_capable().
Jan 11: Use nsown_capable() as suggested by Bastian Blank.
Jan 11: Clarify (hopefully) some logic in futex and sched.c
Feb 15: use ns_capable for ipc, not nsown_capable
Feb 23: let copy_ipcs handle setting ipc_ns->user_ns
Feb 23: pass ns down rather than taking it from current

Signed-off-by: Serge E. Hallyn <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Acked-by: Daniel Lezcano <[email protected]>
---
include/linux/ipc_namespace.h | 7 ++++---
ipc/msg.c | 8 ++++----
ipc/namespace.c | 13 ++++++++-----
ipc/sem.c | 9 +++++----
ipc/shm.c | 9 +++++----
ipc/util.c | 22 +++++++++++++---------
ipc/util.h | 5 +++--
kernel/futex.c | 11 ++++++++++-
kernel/futex_compat.c | 11 ++++++++++-
kernel/groups.c | 2 +-
kernel/nsproxy.c | 7 +------
kernel/sched.c | 9 ++++++---
kernel/uid16.c | 2 +-
13 files changed, 71 insertions(+), 44 deletions(-)

diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index c8cdf0e..9ce8bf7 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -5,6 +5,7 @@
#include <linux/idr.h>
#include <linux/rwsem.h>
#include <linux/notifier.h>
+#include <linux/nsproxy.h>

/*
* ipc namespace events
@@ -94,7 +95,7 @@ static inline int mq_init_ns(struct ipc_namespace *ns) { return 0; }

#if defined(CONFIG_IPC_NS)
extern struct ipc_namespace *copy_ipcs(unsigned long flags,
- struct ipc_namespace *ns);
+ struct task_struct *tsk);
static inline struct ipc_namespace *get_ipc_ns(struct ipc_namespace *ns)
{
if (ns)
@@ -105,12 +106,12 @@ static inline struct ipc_namespace *get_ipc_ns(struct ipc_namespace *ns)
extern void put_ipc_ns(struct ipc_namespace *ns);
#else
static inline struct ipc_namespace *copy_ipcs(unsigned long flags,
- struct ipc_namespace *ns)
+ struct task_struct *tsk)
{
if (flags & CLONE_NEWIPC)
return ERR_PTR(-EINVAL);

- return ns;
+ return tsk->nsproxy->ipc_ns;
}

static inline struct ipc_namespace *get_ipc_ns(struct ipc_namespace *ns)
diff --git a/ipc/msg.c b/ipc/msg.c
index 747b655..0e732e9 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -421,7 +421,7 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd,
return -EFAULT;
}

- ipcp = ipcctl_pre_down(&msg_ids(ns), msqid, cmd,
+ ipcp = ipcctl_pre_down(ns, &msg_ids(ns), msqid, cmd,
&msqid64.msg_perm, msqid64.msg_qbytes);
if (IS_ERR(ipcp))
return PTR_ERR(ipcp);
@@ -539,7 +539,7 @@ SYSCALL_DEFINE3(msgctl, int, msqid, int, cmd, struct msqid_ds __user *, buf)
success_return = 0;
}
err = -EACCES;
- if (ipcperms(&msq->q_perm, S_IRUGO))
+ if (ipcperms(ns, &msq->q_perm, S_IRUGO))
goto out_unlock;

err = security_msg_queue_msgctl(msq, cmd);
@@ -664,7 +664,7 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,
struct msg_sender s;

err = -EACCES;
- if (ipcperms(&msq->q_perm, S_IWUGO))
+ if (ipcperms(ns, &msq->q_perm, S_IWUGO))
goto out_unlock_free;

err = security_msg_queue_msgsnd(msq, msg, msgflg);
@@ -774,7 +774,7 @@ long do_msgrcv(int msqid, long *pmtype, void __user *mtext,
struct list_head *tmp;

msg = ERR_PTR(-EACCES);
- if (ipcperms(&msq->q_perm, S_IRUGO))
+ if (ipcperms(ns, &msq->q_perm, S_IRUGO))
goto out_unlock;

msg = ERR_PTR(-EAGAIN);
diff --git a/ipc/namespace.c b/ipc/namespace.c
index aa18899..3c3e522 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -15,7 +15,8 @@

#include "util.h"

-static struct ipc_namespace *create_ipc_ns(struct ipc_namespace *old_ns)
+static struct ipc_namespace *create_ipc_ns(struct task_struct *tsk,
+ struct ipc_namespace *old_ns)
{
struct ipc_namespace *ns;
int err;
@@ -44,17 +45,19 @@ static struct ipc_namespace *create_ipc_ns(struct ipc_namespace *old_ns)
ipcns_notify(IPCNS_CREATED);
register_ipcns_notifier(ns);

- ns->user_ns = old_ns->user_ns;
- get_user_ns(ns->user_ns);
+ ns->user_ns = get_user_ns(task_cred_xxx(tsk, user)->user_ns);

return ns;
}

-struct ipc_namespace *copy_ipcs(unsigned long flags, struct ipc_namespace *ns)
+struct ipc_namespace *copy_ipcs(unsigned long flags,
+ struct task_struct *tsk)
{
+ struct ipc_namespace *ns = tsk->nsproxy->ipc_ns;
+
if (!(flags & CLONE_NEWIPC))
return get_ipc_ns(ns);
- return create_ipc_ns(ns);
+ return create_ipc_ns(tsk, ns);
}

/*
diff --git a/ipc/sem.c b/ipc/sem.c
index 0e0d49b..34540f1 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -817,7 +817,7 @@ static int semctl_nolock(struct ipc_namespace *ns, int semid,
}

err = -EACCES;
- if (ipcperms (&sma->sem_perm, S_IRUGO))
+ if (ipcperms (ns, &sma->sem_perm, S_IRUGO))
goto out_unlock;

err = security_sem_semctl(sma, cmd);
@@ -862,7 +862,7 @@ static int semctl_main(struct ipc_namespace *ns, int semid, int semnum,
nsems = sma->sem_nsems;

err = -EACCES;
- if (ipcperms (&sma->sem_perm, (cmd==SETVAL||cmd==SETALL)?S_IWUGO:S_IRUGO))
+ if (ipcperms (ns, &sma->sem_perm, (cmd==SETVAL||cmd==SETALL)?S_IWUGO:S_IRUGO))
goto out_unlock;

err = security_sem_semctl(sma, cmd);
@@ -1047,7 +1047,8 @@ static int semctl_down(struct ipc_namespace *ns, int semid,
return -EFAULT;
}

- ipcp = ipcctl_pre_down(&sem_ids(ns), semid, cmd, &semid64.sem_perm, 0);
+ ipcp = ipcctl_pre_down(ns, &sem_ids(ns), semid, cmd,
+ &semid64.sem_perm, 0);
if (IS_ERR(ipcp))
return PTR_ERR(ipcp);

@@ -1386,7 +1387,7 @@ SYSCALL_DEFINE4(semtimedop, int, semid, struct sembuf __user *, tsops,
goto out_unlock_free;

error = -EACCES;
- if (ipcperms(&sma->sem_perm, alter ? S_IWUGO : S_IRUGO))
+ if (ipcperms(ns, &sma->sem_perm, alter ? S_IWUGO : S_IRUGO))
goto out_unlock_free;

error = security_sem_semop(sma, sops, nsops, alter);
diff --git a/ipc/shm.c b/ipc/shm.c
index 7d3bb22..36369e0 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -623,7 +623,8 @@ static int shmctl_down(struct ipc_namespace *ns, int shmid, int cmd,
return -EFAULT;
}

- ipcp = ipcctl_pre_down(&shm_ids(ns), shmid, cmd, &shmid64.shm_perm, 0);
+ ipcp = ipcctl_pre_down(ns, &shm_ids(ns), shmid, cmd,
+ &shmid64.shm_perm, 0);
if (IS_ERR(ipcp))
return PTR_ERR(ipcp);

@@ -737,7 +738,7 @@ SYSCALL_DEFINE3(shmctl, int, shmid, int, cmd, struct shmid_ds __user *, buf)
result = 0;
}
err = -EACCES;
- if (ipcperms (&shp->shm_perm, S_IRUGO))
+ if (ipcperms (ns, &shp->shm_perm, S_IRUGO))
goto out_unlock;
err = security_shm_shmctl(shp, cmd);
if (err)
@@ -773,7 +774,7 @@ SYSCALL_DEFINE3(shmctl, int, shmid, int, cmd, struct shmid_ds __user *, buf)

audit_ipc_obj(&(shp->shm_perm));

- if (!capable(CAP_IPC_LOCK)) {
+ if (!ns_capable(ns->user_ns, CAP_IPC_LOCK)) {
uid_t euid = current_euid();
err = -EPERM;
if (euid != shp->shm_perm.uid &&
@@ -888,7 +889,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, ulong *raddr)
}

err = -EACCES;
- if (ipcperms(&shp->shm_perm, acc_mode))
+ if (ipcperms(ns, &shp->shm_perm, acc_mode))
goto out_unlock;

err = security_shm_shmat(shp, shmaddr, shmflg);
diff --git a/ipc/util.c b/ipc/util.c
index 69a0cc1..267bb35 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -329,12 +329,14 @@ retry:
*
* It is called with ipc_ids.rw_mutex and ipcp->lock held.
*/
-static int ipc_check_perms(struct kern_ipc_perm *ipcp, struct ipc_ops *ops,
- struct ipc_params *params)
+static int ipc_check_perms(struct ipc_namespace *ns,
+ struct kern_ipc_perm *ipcp,
+ struct ipc_ops *ops,
+ struct ipc_params *params)
{
int err;

- if (ipcperms(ipcp, params->flg))
+ if (ipcperms(ns, ipcp, params->flg))
err = -EACCES;
else {
err = ops->associate(ipcp, params->flg);
@@ -396,7 +398,7 @@ retry:
* ipc_check_perms returns the IPC id on
* success
*/
- err = ipc_check_perms(ipcp, ops, params);
+ err = ipc_check_perms(ns, ipcp, ops, params);
}
ipc_unlock(ipcp);
}
@@ -612,7 +614,7 @@ void ipc_rcu_putref(void *ptr)
* to ipc resources. return 0 if allowed
*/

-int ipcperms (struct kern_ipc_perm *ipcp, short flag)
+int ipcperms (struct ipc_namespace *ns, struct kern_ipc_perm *ipcp, short flag)
{ /* flag will most probably be 0 or S_...UGO from <linux/stat.h> */
uid_t euid = current_euid();
int requested_mode, granted_mode;
@@ -627,7 +629,7 @@ int ipcperms (struct kern_ipc_perm *ipcp, short flag)
granted_mode >>= 3;
/* is there some bit set in requested_mode but not in granted_mode? */
if ((requested_mode & ~granted_mode & 0007) &&
- !capable(CAP_IPC_OWNER))
+ !ns_capable(ns->user_ns, CAP_IPC_OWNER))
return -1;

return security_ipc_permission(ipcp, flag);
@@ -765,6 +767,7 @@ void ipc_update_perm(struct ipc64_perm *in, struct kern_ipc_perm *out)

/**
* ipcctl_pre_down - retrieve an ipc and check permissions for some IPC_XXX cmd
+ * @ids: the ipc namespace
* @ids: the table of ids where to look for the ipc
* @id: the id of the ipc to retrieve
* @cmd: the cmd to check
@@ -779,7 +782,8 @@ void ipc_update_perm(struct ipc64_perm *in, struct kern_ipc_perm *out)
* - returns the ipc with both ipc and rw_mutex locks held in case of success
* or an err-code without any lock held otherwise.
*/
-struct kern_ipc_perm *ipcctl_pre_down(struct ipc_ids *ids, int id, int cmd,
+struct kern_ipc_perm *ipcctl_pre_down(struct ipc_namespace *ns,
+ struct ipc_ids *ids, int id, int cmd,
struct ipc64_perm *perm, int extra_perm)
{
struct kern_ipc_perm *ipcp;
@@ -799,8 +803,8 @@ struct kern_ipc_perm *ipcctl_pre_down(struct ipc_ids *ids, int id, int cmd,
perm->gid, perm->mode);

euid = current_euid();
- if (euid == ipcp->cuid ||
- euid == ipcp->uid || capable(CAP_SYS_ADMIN))
+ if (euid == ipcp->cuid || euid == ipcp->uid ||
+ ns_capable(ns->user_ns, CAP_SYS_ADMIN))
return ipcp;

err = -EPERM;
diff --git a/ipc/util.h b/ipc/util.h
index 764b51a..6f5c20b 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -103,7 +103,7 @@ int ipc_get_maxid(struct ipc_ids *);
void ipc_rmid(struct ipc_ids *, struct kern_ipc_perm *);

/* must be called with ipcp locked */
-int ipcperms(struct kern_ipc_perm *ipcp, short flg);
+int ipcperms(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp, short flg);

/* for rare, potentially huge allocations.
* both function can sleep
@@ -126,7 +126,8 @@ struct kern_ipc_perm *ipc_lock(struct ipc_ids *, int);
void kernel_to_ipc64_perm(struct kern_ipc_perm *in, struct ipc64_perm *out);
void ipc64_perm_to_ipc_perm(struct ipc64_perm *in, struct ipc_perm *out);
void ipc_update_perm(struct ipc64_perm *in, struct kern_ipc_perm *out);
-struct kern_ipc_perm *ipcctl_pre_down(struct ipc_ids *ids, int id, int cmd,
+struct kern_ipc_perm *ipcctl_pre_down(struct ipc_namespace *ns,
+ struct ipc_ids *ids, int id, int cmd,
struct ipc64_perm *perm, int extra_perm);

#ifndef __ARCH_WANT_IPC_PARSE_VERSION
diff --git a/kernel/futex.c b/kernel/futex.c
index b766d28..1e876f1 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2421,10 +2421,19 @@ SYSCALL_DEFINE3(get_robust_list, int, pid,
goto err_unlock;
ret = -EPERM;
pcred = __task_cred(p);
+ /* If victim is in different user_ns, then uids are not
+ comparable, so we must have CAP_SYS_PTRACE */
+ if (cred->user->user_ns != pcred->user->user_ns) {
+ if (!ns_capable(pcred->user->user_ns, CAP_SYS_PTRACE))
+ goto err_unlock;
+ goto ok;
+ }
+ /* If victim is in same user_ns, then uids are comparable */
if (cred->euid != pcred->euid &&
cred->euid != pcred->uid &&
- !capable(CAP_SYS_PTRACE))
+ !ns_capable(pcred->user->user_ns, CAP_SYS_PTRACE))
goto err_unlock;
+ok:
head = p->robust_list;
rcu_read_unlock();
}
diff --git a/kernel/futex_compat.c b/kernel/futex_compat.c
index a7934ac..5f9e689 100644
--- a/kernel/futex_compat.c
+++ b/kernel/futex_compat.c
@@ -153,10 +153,19 @@ compat_sys_get_robust_list(int pid, compat_uptr_t __user *head_ptr,
goto err_unlock;
ret = -EPERM;
pcred = __task_cred(p);
+ /* If victim is in different user_ns, then uids are not
+ comparable, so we must have CAP_SYS_PTRACE */
+ if (cred->user->user_ns != pcred->user->user_ns) {
+ if (!ns_capable(pcred->user->user_ns, CAP_SYS_PTRACE))
+ goto err_unlock;
+ goto ok;
+ }
+ /* If victim is in same user_ns, then uids are comparable */
if (cred->euid != pcred->euid &&
cred->euid != pcred->uid &&
- !capable(CAP_SYS_PTRACE))
+ !ns_capable(pcred->user->user_ns, CAP_SYS_PTRACE))
goto err_unlock;
+ok:
head = p->compat_robust_list;
rcu_read_unlock();
}
diff --git a/kernel/groups.c b/kernel/groups.c
index 253dc0f..1cc476d 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -233,7 +233,7 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
struct group_info *group_info;
int retval;

- if (!capable(CAP_SETGID))
+ if (!nsown_capable(CAP_SETGID))
return -EPERM;
if ((unsigned)gidsetsize > NGROUPS_MAX)
return -EINVAL;
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index ac8a56e..a05d191 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -75,16 +75,11 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
goto out_uts;
}

- new_nsp->ipc_ns = copy_ipcs(flags, tsk->nsproxy->ipc_ns);
+ new_nsp->ipc_ns = copy_ipcs(flags, tsk);
if (IS_ERR(new_nsp->ipc_ns)) {
err = PTR_ERR(new_nsp->ipc_ns);
goto out_ipc;
}
- if (new_nsp->ipc_ns != tsk->nsproxy->ipc_ns) {
- put_user_ns(new_nsp->ipc_ns->user_ns);
- new_nsp->ipc_ns->user_ns = task_cred_xxx(tsk, user)->user_ns;
- get_user_ns(new_nsp->ipc_ns->user_ns);
- }

new_nsp->pid_ns = copy_pid_ns(flags, task_active_pid_ns(tsk));
if (IS_ERR(new_nsp->pid_ns)) {
diff --git a/kernel/sched.c b/kernel/sched.c
index 18d38e4..dc12bc2 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4761,8 +4761,11 @@ static bool check_same_owner(struct task_struct *p)

rcu_read_lock();
pcred = __task_cred(p);
- match = (cred->euid == pcred->euid ||
- cred->euid == pcred->uid);
+ if (cred->user->user_ns == pcred->user->user_ns)
+ match = (cred->euid == pcred->euid ||
+ cred->euid == pcred->uid);
+ else
+ match = false;
rcu_read_unlock();
return match;
}
@@ -5088,7 +5091,7 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
goto out_free_cpus_allowed;
}
retval = -EPERM;
- if (!check_same_owner(p) && !capable(CAP_SYS_NICE))
+ if (!check_same_owner(p) && !task_ns_capable(p, CAP_SYS_NICE))
goto out_unlock;

retval = security_task_setscheduler(p);
diff --git a/kernel/uid16.c b/kernel/uid16.c
index 4192098..51c6e89 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -189,7 +189,7 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
struct group_info *group_info;
int retval;

- if (!capable(CAP_SETGID))
+ if (!nsown_capable(CAP_SETGID))
return -EPERM;
if ((unsigned)gidsetsize > NGROUPS_MAX)
return -EINVAL;
--
1.7.0.4

2011-02-24 15:03:07

by Serge E. Hallyn

[permalink] [raw]

Subject: [PATCH 09/10] userns: check user namespace for task->file uid equivalence checks

Cheat for now and say all files belong to init_user_ns. Next
step will be to let superblocks belong to a user_ns, and derive
inode_userns(inode) from inode->i_sb->s_user_ns. Finally we'll
introduce more flexible arrangements.

Changelog:
Feb 15: make is_owner_or_cap take const struct inode
Feb 23: make is_owner_or_cap bool

Signed-off-by: Serge E. Hallyn <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Acked-by: Daniel Lezcano <[email protected]>
---
fs/inode.c | 17 +++++++++++++++++
fs/namei.c | 20 +++++++++++++++-----
include/linux/fs.h | 9 +++++++--
3 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index da85e56..f5ac235 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -25,6 +25,7 @@
#include <linux/async.h>
#include <linux/posix_acl.h>
#include <linux/ima.h>
+#include <linux/cred.h>

/*
* This is needed for the following functions:
@@ -1722,3 +1723,19 @@ void inode_init_owner(struct inode *inode, const struct inode *dir,
inode->i_mode = mode;
}
EXPORT_SYMBOL(inode_init_owner);
+
+/*
+ * return true if current either has CAP_FOWNER to the
+ * file, or owns the file.
+ */
+bool is_owner_or_cap(const struct inode *inode)
+{
+ struct user_namespace *ns = inode_userns(inode);
+
+ if (current_user_ns() == ns && current_fsuid() == inode->i_uid)
+ return true;
+ if (ns_capable(ns, CAP_FOWNER))
+ return true;
+ return false;
+}
+EXPORT_SYMBOL(is_owner_or_cap);
diff --git a/fs/namei.c b/fs/namei.c
index 9e701e2..cfac5b4 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -176,6 +176,9 @@ static int acl_permission_check(struct inode *inode, int mask, unsigned int flag

mask &= MAY_READ | MAY_WRITE | MAY_EXEC;

+ if (current_user_ns() != inode_userns(inode))
+ goto other_perms;
+
if (current_fsuid() == inode->i_uid)
mode >>= 6;
else {
@@ -189,6 +192,7 @@ static int acl_permission_check(struct inode *inode, int mask, unsigned int flag
mode >>= 3;
}

+other_perms:
/*
* If the DACs are ok we don't need any capability check.
*/
@@ -230,7 +234,7 @@ int generic_permission(struct inode *inode, int mask, unsigned int flags,
* Executable DACs are overridable if at least one exec bit is set.
*/
if (!(mask & MAY_EXEC) || execute_ok(inode))
- if (capable(CAP_DAC_OVERRIDE))
+ if (ns_capable(inode_userns(inode), CAP_DAC_OVERRIDE))
return 0;

/*
@@ -238,7 +242,7 @@ int generic_permission(struct inode *inode, int mask, unsigned int flags,
*/
mask &= MAY_READ | MAY_WRITE | MAY_EXEC;
if (mask == MAY_READ || (S_ISDIR(inode->i_mode) && !(mask & MAY_WRITE)))
- if (capable(CAP_DAC_READ_SEARCH))
+ if (ns_capable(inode_userns(inode), CAP_DAC_READ_SEARCH))
return 0;

return -EACCES;
@@ -675,6 +679,7 @@ force_reval_path(struct path *path, struct nameidata *nd)
static inline int exec_permission(struct inode *inode, unsigned int flags)
{
int ret;
+ struct user_namespace *ns = inode_userns(inode);

if (inode->i_op->permission) {
ret = inode->i_op->permission(inode, MAY_EXEC, flags);
@@ -687,7 +692,7 @@ static inline int exec_permission(struct inode *inode, unsigned int flags)
if (ret == -ECHILD)
return ret;

- if (capable(CAP_DAC_OVERRIDE) || capable(CAP_DAC_READ_SEARCH))
+ if (ns_capable(ns, CAP_DAC_OVERRIDE) || ns_capable(ns, CAP_DAC_READ_SEARCH))
goto ok;

return ret;
@@ -1940,11 +1945,15 @@ static inline int check_sticky(struct inode *dir, struct inode *inode)

if (!(dir->i_mode & S_ISVTX))
return 0;
+ if (current_user_ns() != inode_userns(inode))
+ goto other_userns;
if (inode->i_uid == fsuid)
return 0;
if (dir->i_uid == fsuid)
return 0;
- return !capable(CAP_FOWNER);
+
+other_userns:
+ return !ns_capable(inode_userns(inode), CAP_FOWNER);
}

/*
@@ -2635,7 +2644,8 @@ int vfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
if (error)
return error;

- if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD))
+ if ((S_ISCHR(mode) || S_ISBLK(mode)) &&
+ !ns_capable(inode_userns(dir), CAP_MKNOD))
return -EPERM;

if (!dir->i_op->mknod)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index bd32159..eb1ddde 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1446,8 +1446,13 @@ enum {
#define put_fs_excl() atomic_dec(&current->fs_excl)
#define has_fs_excl() atomic_read(&current->fs_excl)

-#define is_owner_or_cap(inode) \
- ((current_fsuid() == (inode)->i_uid) || capable(CAP_FOWNER))
+/*
+ * until VFS tracks user namespaces for inodes, just make all files
+ * belong to init_user_ns
+ */
+extern struct user_namespace init_user_ns;
+#define inode_userns(inode) (&init_user_ns)
+extern bool is_owner_or_cap(const struct inode *inode);

/* not quite ready to be deprecated, but... */
extern void lock_super(struct super_block *);
--
1.7.0.4

2011-02-24 15:03:27

by Serge E. Hallyn

[permalink] [raw]

Subject: [PATCH 10/10] rename is_owner_or_cap to inode_owner_or_capable

And give it a kernel-doc comment.

Signed-off-by: Serge E. Hallyn <[email protected]>
---
fs/9p/acl.c | 2 +-
fs/attr.c | 4 ++--
fs/btrfs/acl.c | 2 +-
fs/btrfs/ioctl.c | 2 +-
fs/ext2/acl.c | 2 +-
fs/ext2/ioctl.c | 6 +++---
fs/ext3/acl.c | 2 +-
fs/ext3/ioctl.c | 6 +++---
fs/ext4/acl.c | 2 +-
fs/ext4/ioctl.c | 8 ++++----
fs/fcntl.c | 2 +-
fs/generic_acl.c | 2 +-
fs/gfs2/file.c | 2 +-
fs/hfsplus/ioctl.c | 2 +-
fs/inode.c | 13 ++++++++-----
fs/jffs2/acl.c | 2 +-
fs/jfs/ioctl.c | 2 +-
fs/jfs/xattr.c | 2 +-
fs/logfs/file.c | 2 +-
fs/namei.c | 2 +-
fs/ocfs2/acl.c | 2 +-
fs/ocfs2/ioctl.c | 2 +-
fs/reiserfs/ioctl.c | 4 ++--
fs/reiserfs/xattr_acl.c | 2 +-
fs/ubifs/ioctl.c | 2 +-
fs/utimes.c | 2 +-
fs/xattr.c | 2 +-
include/linux/fs.h | 2 +-
security/selinux/hooks.c | 2 +-
29 files changed, 45 insertions(+), 42 deletions(-)

diff --git a/fs/9p/acl.c b/fs/9p/acl.c
index 02a2cf6..35243c0 100644
--- a/fs/9p/acl.c
+++ b/fs/9p/acl.c
@@ -315,7 +315,7 @@ static int v9fs_xattr_set_acl(struct dentry *dentry, const char *name,

if (S_ISLNK(inode->i_mode))
return -EOPNOTSUPP;
- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
return -EPERM;
if (value) {
/* update the cached acl value */
diff --git a/fs/attr.c b/fs/attr.c
index 7ca4181..1007ed6 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -59,7 +59,7 @@ int inode_change_ok(const struct inode *inode, struct iattr *attr)

/* Make sure a caller can chmod. */
if (ia_valid & ATTR_MODE) {
- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
return -EPERM;
/* Also check the setgid bit! */
if (!in_group_p((ia_valid & ATTR_GID) ? attr->ia_gid :
@@ -69,7 +69,7 @@ int inode_change_ok(const struct inode *inode, struct iattr *attr)

/* Check for setting the inode time. */
if (ia_valid & (ATTR_MTIME_SET | ATTR_ATIME_SET | ATTR_TIMES_SET)) {
- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
return -EPERM;
}

diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c
index 9c94934..de34bfa 100644
--- a/fs/btrfs/acl.c
+++ b/fs/btrfs/acl.c
@@ -170,7 +170,7 @@ static int btrfs_xattr_acl_set(struct dentry *dentry, const char *name,
int ret;
struct posix_acl *acl = NULL;

- if (!is_owner_or_cap(dentry->d_inode))
+ if (!inode_owner_or_capable(dentry->d_inode))
return -EPERM;

if (!IS_POSIXACL(dentry->d_inode))
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index be2d4f6..bf683ef 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -158,7 +158,7 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg)
FS_SYNC_FL | FS_DIRSYNC_FL))
return -EOPNOTSUPP;

- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
return -EACCES;

mutex_lock(&inode->i_mutex);
diff --git a/fs/ext2/acl.c b/fs/ext2/acl.c
index 7b41805..abea5a1 100644
--- a/fs/ext2/acl.c
+++ b/fs/ext2/acl.c
@@ -406,7 +406,7 @@ ext2_xattr_set_acl(struct dentry *dentry, const char *name, const void *value,
return -EINVAL;
if (!test_opt(dentry->d_sb, POSIX_ACL))
return -EOPNOTSUPP;
- if (!is_owner_or_cap(dentry->d_inode))
+ if (!inode_owner_or_capable(dentry->d_inode))
return -EPERM;

if (value) {
diff --git a/fs/ext2/ioctl.c b/fs/ext2/ioctl.c
index e743130..f81e250 100644
--- a/fs/ext2/ioctl.c
+++ b/fs/ext2/ioctl.c
@@ -39,7 +39,7 @@ long ext2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
if (ret)
return ret;

- if (!is_owner_or_cap(inode)) {
+ if (!inode_owner_or_capable(inode)) {
ret = -EACCES;
goto setflags_out;
}
@@ -89,7 +89,7 @@ setflags_out:
case EXT2_IOC_GETVERSION:
return put_user(inode->i_generation, (int __user *) arg);
case EXT2_IOC_SETVERSION:
- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
return -EPERM;
ret = mnt_want_write(filp->f_path.mnt);
if (ret)
@@ -115,7 +115,7 @@ setflags_out:
if (!test_opt(inode->i_sb, RESERVATION) ||!S_ISREG(inode->i_mode))
return -ENOTTY;

- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
return -EACCES;

if (get_user(rsv_window_size, (int __user *)arg))
diff --git a/fs/ext3/acl.c b/fs/ext3/acl.c
index e4fa49e..9d021c0 100644
--- a/fs/ext3/acl.c
+++ b/fs/ext3/acl.c
@@ -435,7 +435,7 @@ ext3_xattr_set_acl(struct dentry *dentry, const char *name, const void *value,
return -EINVAL;
if (!test_opt(inode->i_sb, POSIX_ACL))
return -EOPNOTSUPP;
- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
return -EPERM;

if (value) {
diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c
index fc080dd..f4090bd 100644
--- a/fs/ext3/ioctl.c
+++ b/fs/ext3/ioctl.c
@@ -38,7 +38,7 @@ long ext3_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
unsigned int oldflags;
unsigned int jflag;

- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
return -EACCES;

if (get_user(flags, (int __user *) arg))
@@ -123,7 +123,7 @@ flags_out:
__u32 generation;
int err;

- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
return -EPERM;

err = mnt_want_write(filp->f_path.mnt);
@@ -192,7 +192,7 @@ setversion_out:
if (err)
return err;

- if (!is_owner_or_cap(inode)) {
+ if (!inode_owner_or_capable(inode)) {
err = -EACCES;
goto setrsvsz_out;
}
diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c
index e0270d1..21eacd7 100644
--- a/fs/ext4/acl.c
+++ b/fs/ext4/acl.c
@@ -433,7 +433,7 @@ ext4_xattr_set_acl(struct dentry *dentry, const char *name, const void *value,
return -EINVAL;
if (!test_opt(inode->i_sb, POSIX_ACL))
return -EOPNOTSUPP;
- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
return -EPERM;

if (value) {
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index eb3bc2f..a84faa1 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -38,7 +38,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
unsigned int oldflags;
unsigned int jflag;

- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
return -EACCES;

if (get_user(flags, (int __user *) arg))
@@ -146,7 +146,7 @@ flags_out:
__u32 generation;
int err;

- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
return -EPERM;

err = mnt_want_write(filp->f_path.mnt);
@@ -298,7 +298,7 @@ mext_out:
case EXT4_IOC_MIGRATE:
{
int err;
- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
return -EACCES;

err = mnt_want_write(filp->f_path.mnt);
@@ -320,7 +320,7 @@ mext_out:
case EXT4_IOC_ALLOC_DA_BLKS:
{
int err;
- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
return -EACCES;

err = mnt_want_write(filp->f_path.mnt);
diff --git a/fs/fcntl.c b/fs/fcntl.c
index cb10261..8ff9afa 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -159,7 +159,7 @@ static int setfl(int fd, struct file * filp, unsigned long arg)

/* O_NOATIME can only be set by the owner or superuser */
if ((arg & O_NOATIME) && !(filp->f_flags & O_NOATIME))
- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
return -EPERM;

/* required for strict SunOS emulation */
diff --git a/fs/generic_acl.c b/fs/generic_acl.c
index 06c48a8..8f26d1a 100644
--- a/fs/generic_acl.c
+++ b/fs/generic_acl.c
@@ -74,7 +74,7 @@ generic_acl_set(struct dentry *dentry, const char *name, const void *value,
return -EINVAL;
if (S_ISLNK(inode->i_mode))
return -EOPNOTSUPP;
- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
return -EPERM;
if (value) {
acl = posix_acl_from_xattr(value, size);
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 7cfdcb9..ccbcd1c 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -221,7 +221,7 @@ static int do_gfs2_set_flags(struct file *filp, u32 reqflags, u32 mask)
goto out_drop_write;

error = -EACCES;
- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
goto out;

error = 0;
diff --git a/fs/hfsplus/ioctl.c b/fs/hfsplus/ioctl.c
index 508ce66..fbaa669 100644
--- a/fs/hfsplus/ioctl.c
+++ b/fs/hfsplus/ioctl.c
@@ -47,7 +47,7 @@ static int hfsplus_ioctl_setflags(struct file *file, int __user *user_flags)
if (err)
goto out;

- if (!is_owner_or_cap(inode)) {
+ if (!inode_owner_or_capable(inode)) {
err = -EACCES;
goto out_drop_write;
}
diff --git a/fs/inode.c b/fs/inode.c
index f5ac235..c200e7a 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1724,11 +1724,14 @@ void inode_init_owner(struct inode *inode, const struct inode *dir,
}
EXPORT_SYMBOL(inode_init_owner);

-/*
- * return true if current either has CAP_FOWNER to the
- * file, or owns the file.
+/**
+ * inode_owner_or_capable - check current task permissions to inode
+ * @inode: inode being checked
+ *
+ * Return true if current either has CAP_FOWNER to the inode, or
+ * owns the file.
*/
-bool is_owner_or_cap(const struct inode *inode)
+bool inode_owner_or_capable(const struct inode *inode)
{
struct user_namespace *ns = inode_userns(inode);

@@ -1738,4 +1741,4 @@ bool is_owner_or_cap(const struct inode *inode)
return true;
return false;
}
-EXPORT_SYMBOL(is_owner_or_cap);
+EXPORT_SYMBOL(inode_owner_or_capable);
diff --git a/fs/jffs2/acl.c b/fs/jffs2/acl.c
index 95b7967..828a0e1 100644
--- a/fs/jffs2/acl.c
+++ b/fs/jffs2/acl.c
@@ -402,7 +402,7 @@ static int jffs2_acl_setxattr(struct dentry *dentry, const char *name,

if (name[0] != '\0')
return -EINVAL;
- if (!is_owner_or_cap(dentry->d_inode))
+ if (!inode_owner_or_capable(dentry->d_inode))
return -EPERM;

if (value) {
diff --git a/fs/jfs/ioctl.c b/fs/jfs/ioctl.c
index afe222b..6f98a18 100644
--- a/fs/jfs/ioctl.c
+++ b/fs/jfs/ioctl.c
@@ -72,7 +72,7 @@ long jfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
if (err)
return err;

- if (!is_owner_or_cap(inode)) {
+ if (!inode_owner_or_capable(inode)) {
err = -EACCES;
goto setflags_out;
}
diff --git a/fs/jfs/xattr.c b/fs/jfs/xattr.c
index 2d7f165..6c0d276 100644
--- a/fs/jfs/xattr.c
+++ b/fs/jfs/xattr.c
@@ -678,7 +678,7 @@ static int can_set_system_xattr(struct inode *inode, const char *name,
struct posix_acl *acl;
int rc;

- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
return -EPERM;

/*
diff --git a/fs/logfs/file.c b/fs/logfs/file.c
index e86376b..c2ad702 100644
--- a/fs/logfs/file.c
+++ b/fs/logfs/file.c
@@ -196,7 +196,7 @@ long logfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
if (IS_RDONLY(inode))
return -EROFS;

- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
return -EACCES;

err = get_user(flags, (int __user *)arg);
diff --git a/fs/namei.c b/fs/namei.c
index cfac5b4..8da5a59 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2129,7 +2129,7 @@ int may_open(struct path *path, int acc_mode, int flag)
}

/* O_NOATIME can only be set by the owner or superuser */
- if (flag & O_NOATIME && !is_owner_or_cap(inode))
+ if (flag & O_NOATIME && !inode_owner_or_capable(inode))
return -EPERM;

/*
diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index 704f6b1..90f2729 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -497,7 +497,7 @@ static int ocfs2_xattr_set_acl(struct dentry *dentry, const char *name,
if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
return -EOPNOTSUPP;

- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
return -EPERM;

if (value) {
diff --git a/fs/ocfs2/ioctl.c b/fs/ocfs2/ioctl.c
index 7a48681..09de77c 100644
--- a/fs/ocfs2/ioctl.c
+++ b/fs/ocfs2/ioctl.c
@@ -82,7 +82,7 @@ static int ocfs2_set_inode_attr(struct inode *inode, unsigned flags,
}

status = -EACCES;
- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
goto bail_unlock;

if (!S_ISDIR(inode->i_mode))
diff --git a/fs/reiserfs/ioctl.c b/fs/reiserfs/ioctl.c
index 79265fd..4e15305 100644
--- a/fs/reiserfs/ioctl.c
+++ b/fs/reiserfs/ioctl.c
@@ -59,7 +59,7 @@ long reiserfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
if (err)
break;

- if (!is_owner_or_cap(inode)) {
+ if (!inode_owner_or_capable(inode)) {
err = -EPERM;
goto setflags_out;
}
@@ -103,7 +103,7 @@ setflags_out:
err = put_user(inode->i_generation, (int __user *)arg);
break;
case REISERFS_IOC_SETVERSION:
- if (!is_owner_or_cap(inode)) {
+ if (!inode_owner_or_capable(inode)) {
err = -EPERM;
break;
}
diff --git a/fs/reiserfs/xattr_acl.c b/fs/reiserfs/xattr_acl.c
index 90d2fcb..3dc38f1 100644
--- a/fs/reiserfs/xattr_acl.c
+++ b/fs/reiserfs/xattr_acl.c
@@ -26,7 +26,7 @@ posix_acl_set(struct dentry *dentry, const char *name, const void *value,
size_t jcreate_blocks;
if (!reiserfs_posixacl(inode->i_sb))
return -EOPNOTSUPP;
- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
return -EPERM;

if (value) {
diff --git a/fs/ubifs/ioctl.c b/fs/ubifs/ioctl.c
index 8aacd64..548acf4 100644
--- a/fs/ubifs/ioctl.c
+++ b/fs/ubifs/ioctl.c
@@ -160,7 +160,7 @@ long ubifs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
if (IS_RDONLY(inode))
return -EROFS;

- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
return -EACCES;

if (get_user(flags, (int __user *) arg))
diff --git a/fs/utimes.c b/fs/utimes.c
index 179b586..ba653f3 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -95,7 +95,7 @@ static int utimes_common(struct path *path, struct timespec *times)
if (IS_IMMUTABLE(inode))
goto mnt_drop_write_and_out;

- if (!is_owner_or_cap(inode)) {
+ if (!inode_owner_or_capable(inode)) {
error = inode_permission(inode, MAY_WRITE);
if (error)
goto mnt_drop_write_and_out;
diff --git a/fs/xattr.c b/fs/xattr.c
index 01bb813..a19acdb 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -59,7 +59,7 @@ xattr_permission(struct inode *inode, const char *name, int mask)
if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
return -EPERM;
if (S_ISDIR(inode->i_mode) && (inode->i_mode & S_ISVTX) &&
- (mask & MAY_WRITE) && !is_owner_or_cap(inode))
+ (mask & MAY_WRITE) && !inode_owner_or_capable(inode))
return -EPERM;
}

diff --git a/include/linux/fs.h b/include/linux/fs.h
index eb1ddde..40e05c7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1452,7 +1452,7 @@ enum {
*/
extern struct user_namespace init_user_ns;
#define inode_userns(inode) (&init_user_ns)
-extern bool is_owner_or_cap(const struct inode *inode);
+extern bool inode_owner_or_capable(const struct inode *inode);

/* not quite ready to be deprecated, but... */
extern void lock_super(struct super_block *);
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 6dcda48..433236f 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2721,7 +2721,7 @@ static int selinux_inode_setxattr(struct dentry *dentry, const char *name,
if (!(sbsec->flags & SE_SBLABELSUPP))
return -EOPNOTSUPP;

- if (!is_owner_or_cap(inode))
+ if (!inode_owner_or_capable(inode))
return -EPERM;

COMMON_AUDIT_DATA_INIT(&ad, FS);
--
1.7.0.4

2011-02-24 16:59:12

by David Howells

[permalink] [raw]

Subject: Re: [PATCH 05/10] Allow ptrace from non-init user namespaces

Serge E. Hallyn <[email protected]> wrote:

> ptrace is allowed to tasks in the same user namespace according to

That's still missing a verb.

2011-02-24 17:03:25

by David Howells

[permalink] [raw]

Subject: Re: [PATCH 01/10] Add a user_namespace as creator/owner of uts_namespace

Feel free to add my Acked-by to these patches, give or take the correction to
the change description of the 5th patch.

David

2011-03-01 00:28:45

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH 01/10] Add a user_namespace as creator/owner of uts_namespace

On Thu, 24 Feb 2011 15:01:51 +0000
"Serge E. Hallyn" <[email protected]> wrote:

> Cc: [email protected], [email protected]

I don't think those addresses do what you think they do.

> copy_process() handles CLONE_NEWUSER before the rest of the
> namespaces. So in the case of clone(CLONE_NEWUSER|CLONE_NEWUTS)
> the new uts namespace will have the new user namespace as its
> owner. That is what we want, since we want root in that new
> userns to be able to have privilege over it.
>

Well this sucks. Anyone who is reading this patch series really won't
have a clue what any of it is for. There's no context provided.

A useful way of thinking about this is to ask yourself "what will Linus
think when this stuff hits his inbox". If the answer is "he'll say
wtf" then we're doing it wrong.

Sigh.

I shall (again) paste in the below text, which I snarfed from the wiki.
Please check that it is complete, accurate and adequate. If not,
please send along replacement text.

: The expected course of development for user namespaces targeted
: capabilities is laid out at https://wiki.ubuntu.com/UserNamespace.
:
: Goals:
:
: - Make it safe for an unprivileged user to unshare namespaces. They
: will be privileged with respect to the new namespace, but this should
: only include resources which the unprivileged user already owns.
:
: - Provide separate limits and accounting for userids in different
: namespaces.
:
: Status:
:
: Currently (as of 2.6.38) you can clone with the CLONE_NEWUSER flag to
: get a new user namespace if you have the CAP_SYS_ADMIN, CAP_SETUID, and
: CAP_SETGID capabilities. What this gets you is a whole new set of
: userids, meaning that user 500 will have a different 'struct user' in
: your namespace than in other namespaces. So any accounting information
: stored in struct user will be unique to your namespace.
:
: However, throughout the kernel there are checks which
:
: - simply check for a capability. Since root in a child namespace
: has all capabilities, this means that a child namespace is not
: constrained.
:
: - simply compare uid1 == uid2. Since these are the integer uids,
: uid 500 in namespace 1 will be said to be equal to uid 500 in
: namespace 2.
:
: As a result, the lxc implementation at lxc.sf.net does not use user
: namespaces. This is actually helpful because it leaves us free to
: develop user namespaces in such a way that, for some time, user
: namespaces may be unuseful.
:
:
: Bugs aside, this patchset is supposed to not at all affect systems which
: are not actively using user namespaces, and only restrict what tasks in
: child user namespace can do. They begin to limit privilege to a user
: namespace, so that root in a container cannot kill or ptrace tasks in the
: parent user namespace, and can only get world access rights to files.
: Since all files currently belong to the initila user namespace, that means
: that child user namespaces can only get world access rights to *all*
: files. While this temporarily makes user namespaces bad for system
: containers, it starts to get useful for some sandboxing.
:
: I've run the 'runltplite.sh' with and without this patchset and found no
: difference.

2011-03-01 05:37:16

by Serge Hallyn

[permalink] [raw]

Subject: Re: [PATCH 01/10] Add a user_namespace as creator/owner of uts_namespace

Quoting Andrew Morton ([email protected]):
> On Thu, 24 Feb 2011 15:01:51 +0000
> "Serge E. Hallyn" <[email protected]> wrote:
>
> > Cc: [email protected], [email protected]
>
> I don't think those addresses do what you think they do.

!*&$(*&*(7!

> > copy_process() handles CLONE_NEWUSER before the rest of the
> > namespaces. So in the case of clone(CLONE_NEWUSER|CLONE_NEWUTS)
> > the new uts namespace will have the new user namespace as its
> > owner. That is what we want, since we want root in that new
> > userns to be able to have privilege over it.
> >
>
> Well this sucks. Anyone who is reading this patch series really won't
> have a clue what any of it is for. There's no context provided.
>
> A useful way of thinking about this is to ask yourself "what will Linus
> think when this stuff hits his inbox". If the answer is "he'll say
> wtf" then we're doing it wrong.
>
> Sigh.
>
> I shall (again) paste in the below text, which I snarfed from the wiki.
> Please check that it is complete, accurate and adequate. If not,
> please send along replacement text.

Sorry. Yes, that's good.

thanks,
-serge

> : The expected course of development for user namespaces targeted
> : capabilities is laid out at https://wiki.ubuntu.com/UserNamespace.
> :
> : Goals:
> :
> : - Make it safe for an unprivileged user to unshare namespaces. They
> : will be privileged with respect to the new namespace, but this should
> : only include resources which the unprivileged user already owns.
> :
> : - Provide separate limits and accounting for userids in different
> : namespaces.
> :
> : Status:
> :
> : Currently (as of 2.6.38) you can clone with the CLONE_NEWUSER flag to
> : get a new user namespace if you have the CAP_SYS_ADMIN, CAP_SETUID, and
> : CAP_SETGID capabilities. What this gets you is a whole new set of
> : userids, meaning that user 500 will have a different 'struct user' in
> : your namespace than in other namespaces. So any accounting information
> : stored in struct user will be unique to your namespace.
> :
> : However, throughout the kernel there are checks which
> :
> : - simply check for a capability. Since root in a child namespace
> : has all capabilities, this means that a child namespace is not
> : constrained.
> :
> : - simply compare uid1 == uid2. Since these are the integer uids,
> : uid 500 in namespace 1 will be said to be equal to uid 500 in
> : namespace 2.
> :
> : As a result, the lxc implementation at lxc.sf.net does not use user
> : namespaces. This is actually helpful because it leaves us free to
> : develop user namespaces in such a way that, for some time, user
> : namespaces may be unuseful.
> :
> :
> : Bugs aside, this patchset is supposed to not at all affect systems which
> : are not actively using user namespaces, and only restrict what tasks in
> : child user namespace can do. They begin to limit privilege to a user
> : namespace, so that root in a container cannot kill or ptrace tasks in the
> : parent user namespace, and can only get world access rights to files.
> : Since all files currently belong to the initila user namespace, that means
> : that child user namespaces can only get world access rights to *all*
> : files. While this temporarily makes user namespaces bad for system
> : containers, it starts to get useful for some sandboxing.
> :
> : I've run the 'runltplite.sh' with and without this patchset and found no
> : difference.
>
> _______________________________________________
> Containers mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/containers

2011-03-01 22:25:00

by Nathan Lynch

[permalink] [raw]

Subject: Re: [PATCH 09/10] userns: check user namespace for task->file uid equivalence checks

On Thu, 2011-02-24 at 15:03 +0000, Serge E. Hallyn wrote:
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1446,8 +1446,13 @@ enum {
> #define put_fs_excl() atomic_dec(&current->fs_excl)
> #define has_fs_excl() atomic_read(&current->fs_excl)
>
> -#define is_owner_or_cap(inode) \
> - ((current_fsuid() == (inode)->i_uid) || capable(CAP_FOWNER))
> +/*
> + * until VFS tracks user namespaces for inodes, just make all files
> + * belong to init_user_ns
> + */
> +extern struct user_namespace init_user_ns;

init_user_ns gets declared in fs.h in this patch, utsname.h in patch #1,
capability.h in #2, ipc_namespace.h in #7. Could this declaration be
kept to a single header?

2011-03-01 23:07:33

by Serge Hallyn

[permalink] [raw]

Subject: Re: [PATCH 09/10] userns: check user namespace for task->file uid equivalence checks

Quoting Nathan Lynch ([email protected]):
> On Thu, 2011-02-24 at 15:03 +0000, Serge E. Hallyn wrote:
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -1446,8 +1446,13 @@ enum {
> > #define put_fs_excl() atomic_dec(&current->fs_excl)
> > #define has_fs_excl() atomic_read(&current->fs_excl)
> >
> > -#define is_owner_or_cap(inode) \
> > - ((current_fsuid() == (inode)->i_uid) || capable(CAP_FOWNER))
> > +/*
> > + * until VFS tracks user namespaces for inodes, just make all files
> > + * belong to init_user_ns
> > + */
> > +extern struct user_namespace init_user_ns;
>
> init_user_ns gets declared in fs.h in this patch, utsname.h in patch #1,
> capability.h in #2, ipc_namespace.h in #7. Could this declaration be
> kept to a single header?
>

ipc/msgutil.c includes security.h which includes fs.h, so we should be
able to drop the one in ipc_namespace.h. The one in utsname.h is
there for init/version.c and needed AFAICS.

The one in capability.h should be able to go when has_capability* are
turned into functions. They couldn't be turned into static functions
in capability.h (left as exercise for reader), but they can be made
full-fledged functions in kernel/capability.c. I will do that in a
follow-on patch and try to remove the extra init_user_ns defines as
well.

thanks,
-serge