2013-08-29 23:52:28

by Eric W. Biederman

[permalink] [raw]
Subject: [REVIEW][PATCH 0/5] A couple of lingering namespace patches


There are a couple of long overdue namespace patches, simple cleanups
and permision grants that have been sitting in my development tree
for far too long. If anyone objects to these please let me know.

Eric W. Biederman (4):
namespaces: Simplify copy_namespaces so it is clear what is going on.
userns: Allow PR_CAPBSET_DROP in a user namespace.
pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD
userns: Kill nsown_capable it makes the wrong thing easy

Serge Hallyn (1):
capabilities: allow nice if we are privileged

fs/namespace.c | 4 ++--
fs/open.c | 2 +-
include/linux/capability.h | 1 -
ipc/namespace.c | 2 +-
kernel/capability.c | 12 ------------
kernel/fork.c | 5 -----
kernel/groups.c | 2 +-
kernel/nsproxy.c | 35 +++++++++++------------------------
kernel/pid_namespace.c | 2 +-
kernel/sys.c | 20 ++++++++++----------
kernel/uid16.c | 2 +-
kernel/utsname.c | 2 +-
net/core/net_namespace.c | 2 +-
net/core/scm.c | 4 ++--
security/commoncap.c | 10 +++++-----
15 files changed, 37 insertions(+), 68 deletions(-)

Eric


2013-08-29 23:53:26

by Eric W. Biederman

[permalink] [raw]
Subject: [REVIEW][PATCH 1/5] namespaces: Simplify copy_namespaces so it is clear what is going on.


Remove the test for the impossible case where tsk->nsproxy == NULL. Fork
will never be called with tsk->nsproxy == NULL.

Only call get_nsproxy when we don't need to generate a new_nsproxy,
and mark the case where we don't generate a new nsproxy as likely.

Remove the code to drop an unnecessarily acquired nsproxy value.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/nsproxy.c | 35 +++++++++++------------------------
1 files changed, 11 insertions(+), 24 deletions(-)

diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index d9afd2563..a1ed011 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -125,22 +125,16 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
struct nsproxy *old_ns = tsk->nsproxy;
struct user_namespace *user_ns = task_cred_xxx(tsk, user_ns);
struct nsproxy *new_ns;
- int err = 0;

- if (!old_ns)
+ if (likely(!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
+ CLONE_NEWPID | CLONE_NEWNET)))) {
+ get_nsproxy(old_ns);
return 0;
-
- get_nsproxy(old_ns);
-
- if (!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
- CLONE_NEWPID | CLONE_NEWNET)))
- return 0;
-
- if (!ns_capable(user_ns, CAP_SYS_ADMIN)) {
- err = -EPERM;
- goto out;
}

+ if (!ns_capable(user_ns, CAP_SYS_ADMIN))
+ return -EPERM;
+
/*
* CLONE_NEWIPC must detach from the undolist: after switching
* to a new ipc namespace, the semaphore arrays from the old
@@ -149,22 +143,15 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
* it along with CLONE_NEWIPC.
*/
if ((flags & (CLONE_NEWIPC | CLONE_SYSVSEM)) ==
- (CLONE_NEWIPC | CLONE_SYSVSEM)) {
- err = -EINVAL;
- goto out;
- }
+ (CLONE_NEWIPC | CLONE_SYSVSEM))
+ return -EINVAL;

new_ns = create_new_namespaces(flags, tsk, user_ns, tsk->fs);
- if (IS_ERR(new_ns)) {
- err = PTR_ERR(new_ns);
- goto out;
- }
+ if (IS_ERR(new_ns))
+ return PTR_ERR(new_ns);

tsk->nsproxy = new_ns;
-
-out:
- put_nsproxy(old_ns);
- return err;
+ return 0;
}

void free_nsproxy(struct nsproxy *ns)
--
1.7.5.4

2013-08-29 23:54:12

by Eric W. Biederman

[permalink] [raw]
Subject: [REVIEW][PATCH 2/5] userns: Allow PR_CAPBSET_DROP in a user namespace.


As the capabilites and capability bounding set are per user namespace
properties it is safe to allow changing them with just CAP_SETPCAP
permission in the user namespace.

Signed-off-by: "Eric W. Biederman" <[email protected]>
Tested-by: Richard Weinberger <[email protected]>
---
security/commoncap.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/security/commoncap.c b/security/commoncap.c
index c44b6fe..9fccf71 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -824,7 +824,7 @@ int cap_task_setnice(struct task_struct *p, int nice)
*/
static long cap_prctl_drop(struct cred *new, unsigned long cap)
{
- if (!capable(CAP_SETPCAP))
+ if (!ns_capable(current_user_ns(), CAP_SETPCAP))
return -EPERM;
if (!cap_valid(cap))
return -EINVAL;
--
1.7.5.4

2013-08-29 23:55:13

by Eric W. Biederman

[permalink] [raw]
Subject: [REVIEW][PATCH 3/5] pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD


I goofed when I made unshare(CLONE_NEWPID) only work in a
single-threaded process. There is no need for that requirement and in
fact I analyzied things right for setns. The hard requirement
is for tasks that share a VM to all be in the pid namespace and
we properly prevent that in do_fork.

Just to be certain I took a look through do_wait and
forget_original_parent and there are no cases that make it any harder
for children to be in the multiple pid namespaces than it is for
children to be in the same pid namespace. I also performed a check to
see if there were in uses of task->nsproxy_pid_ns I was not familiar
with, but it is only used when allocating a new pid for a new task,
and in checks to prevent craziness from happening.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/fork.c | 5 -----
1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 66635c8..eb45f1d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1818,11 +1818,6 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
if (unshare_flags & CLONE_NEWUSER)
unshare_flags |= CLONE_THREAD | CLONE_FS;
/*
- * If unsharing a pid namespace must also unshare the thread.
- */
- if (unshare_flags & CLONE_NEWPID)
- unshare_flags |= CLONE_THREAD;
- /*
* If unsharing a thread from a thread group, must also unshare vm.
*/
if (unshare_flags & CLONE_THREAD)
--
1.7.5.4

2013-08-29 23:55:46

by Eric W. Biederman

[permalink] [raw]
Subject: [REVIEW][PATCH 4/5] capabilities: allow nice if we are privileged


We allow task A to change B's nice level if it has a supserset of
B's privileges, or of it has CAP_SYS_NICE. Also allow it if A has
CAP_SYS_NICE with respect to B - meaning it is root in the same
namespace, or it created B's namespace.

Signed-off-by: Serge Hallyn <[email protected]>
Reviewed-by: "Eric W. Biederman" <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
security/commoncap.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/security/commoncap.c b/security/commoncap.c
index 9fccf71..b9d613e 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -768,16 +768,16 @@ int cap_task_fix_setuid(struct cred *new, const struct cred *old, int flags)
*/
static int cap_safe_nice(struct task_struct *p)
{
- int is_subset;
+ int is_subset, ret = 0;

rcu_read_lock();
is_subset = cap_issubset(__task_cred(p)->cap_permitted,
current_cred()->cap_permitted);
+ if (!is_subset && !ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE))
+ ret = -EPERM;
rcu_read_unlock();

- if (!is_subset && !capable(CAP_SYS_NICE))
- return -EPERM;
- return 0;
+ return ret;
}

/**
--
1.7.5.4

2013-08-29 23:57:01

by Eric W. Biederman

[permalink] [raw]
Subject: [REVIEW][PATCH 5/5] userns: Kill nsown_capable it makes the wrong thing easy


nsown_capable is a special case of ns_capable essentially for just CAP_SETUID and
CAP_SETGID. For the existing users it doesn't noticably simplify things and
from the suggested patches I have seen it encourages people to do the wrong
thing. So remove nsown_capable.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
fs/namespace.c | 4 ++--
fs/open.c | 2 +-
include/linux/capability.h | 1 -
ipc/namespace.c | 2 +-
kernel/capability.c | 12 ------------
kernel/groups.c | 2 +-
kernel/pid_namespace.c | 2 +-
kernel/sys.c | 20 ++++++++++----------
kernel/uid16.c | 2 +-
kernel/utsname.c | 2 +-
net/core/net_namespace.c | 2 +-
net/core/scm.c | 4 ++--
12 files changed, 21 insertions(+), 34 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 877e427..dc519a1 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2929,8 +2929,8 @@ static int mntns_install(struct nsproxy *nsproxy, void *ns)
struct path root;

if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) ||
- !nsown_capable(CAP_SYS_CHROOT) ||
- !nsown_capable(CAP_SYS_ADMIN))
+ !ns_capable(current_user_ns(), CAP_SYS_CHROOT) ||
+ !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
return -EPERM;

if (fs->users != 1)
diff --git a/fs/open.c b/fs/open.c
index 9156cb0..2a57580 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -443,7 +443,7 @@ retry:
goto dput_and_out;

error = -EPERM;
- if (!nsown_capable(CAP_SYS_CHROOT))
+ if (!ns_capable(current_user_ns(), CAP_SYS_CHROOT))
goto dput_and_out;
error = security_path_chroot(&path);
if (error)
diff --git a/include/linux/capability.h b/include/linux/capability.h
index d9a4f7f..a6ee1f9 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -210,7 +210,6 @@ extern bool has_ns_capability_noaudit(struct task_struct *t,
struct user_namespace *ns, int cap);
extern bool capable(int cap);
extern bool ns_capable(struct user_namespace *ns, int cap);
-extern bool nsown_capable(int cap);
extern bool inode_capable(const struct inode *inode, int cap);
extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);

diff --git a/ipc/namespace.c b/ipc/namespace.c
index 7ee61bf..4be6581 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -171,7 +171,7 @@ static int ipcns_install(struct nsproxy *nsproxy, void *new)
{
struct ipc_namespace *ns = new;
if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
- !nsown_capable(CAP_SYS_ADMIN))
+ !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
return -EPERM;

/* Ditch state from the old ipc namespace */
diff --git a/kernel/capability.c b/kernel/capability.c
index f6c2ce5..6fc1c8a 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -433,18 +433,6 @@ bool capable(int cap)
EXPORT_SYMBOL(capable);

/**
- * nsown_capable - Check superior capability to one's own user_ns
- * @cap: The capability in question
- *
- * Return true if the current task has the given superior capability
- * targeted at its own user namespace.
- */
-bool nsown_capable(int cap)
-{
- return ns_capable(current_user_ns(), cap);
-}
-
-/**
* inode_capable - Check superior capability over inode
* @inode: The inode in question
* @cap: The capability in question
diff --git a/kernel/groups.c b/kernel/groups.c
index 6b2588d..90cf1c3 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -233,7 +233,7 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
struct group_info *group_info;
int retval;

- if (!nsown_capable(CAP_SETGID))
+ if (!ns_capable(current_user_ns(), CAP_SETGID))
return -EPERM;
if ((unsigned)gidsetsize > NGROUPS_MAX)
return -EINVAL;
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 6917e8e..ee1f6bb 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -329,7 +329,7 @@ static int pidns_install(struct nsproxy *nsproxy, void *ns)
struct pid_namespace *ancestor, *new = ns;

if (!ns_capable(new->user_ns, CAP_SYS_ADMIN) ||
- !nsown_capable(CAP_SYS_ADMIN))
+ !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
return -EPERM;

/*
diff --git a/kernel/sys.c b/kernel/sys.c
index 771129b..c18ecca 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -337,7 +337,7 @@ SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
if (rgid != (gid_t) -1) {
if (gid_eq(old->gid, krgid) ||
gid_eq(old->egid, krgid) ||
- nsown_capable(CAP_SETGID))
+ ns_capable(old->user_ns, CAP_SETGID))
new->gid = krgid;
else
goto error;
@@ -346,7 +346,7 @@ SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
if (gid_eq(old->gid, kegid) ||
gid_eq(old->egid, kegid) ||
gid_eq(old->sgid, kegid) ||
- nsown_capable(CAP_SETGID))
+ ns_capable(old->user_ns, CAP_SETGID))
new->egid = kegid;
else
goto error;
@@ -387,7 +387,7 @@ SYSCALL_DEFINE1(setgid, gid_t, gid)
old = current_cred();

retval = -EPERM;
- if (nsown_capable(CAP_SETGID))
+ if (ns_capable(old->user_ns, CAP_SETGID))
new->gid = new->egid = new->sgid = new->fsgid = kgid;
else if (gid_eq(kgid, old->gid) || gid_eq(kgid, old->sgid))
new->egid = new->fsgid = kgid;
@@ -471,7 +471,7 @@ SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
new->uid = kruid;
if (!uid_eq(old->uid, kruid) &&
!uid_eq(old->euid, kruid) &&
- !nsown_capable(CAP_SETUID))
+ !ns_capable(old->user_ns, CAP_SETUID))
goto error;
}

@@ -480,7 +480,7 @@ SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
if (!uid_eq(old->uid, keuid) &&
!uid_eq(old->euid, keuid) &&
!uid_eq(old->suid, keuid) &&
- !nsown_capable(CAP_SETUID))
+ !ns_capable(old->user_ns, CAP_SETUID))
goto error;
}

@@ -534,7 +534,7 @@ SYSCALL_DEFINE1(setuid, uid_t, uid)
old = current_cred();

retval = -EPERM;
- if (nsown_capable(CAP_SETUID)) {
+ if (ns_capable(old->user_ns, CAP_SETUID)) {
new->suid = new->uid = kuid;
if (!uid_eq(kuid, old->uid)) {
retval = set_user(new);
@@ -591,7 +591,7 @@ SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid)
old = current_cred();

retval = -EPERM;
- if (!nsown_capable(CAP_SETUID)) {
+ if (!ns_capable(old->user_ns, CAP_SETUID)) {
if (ruid != (uid_t) -1 && !uid_eq(kruid, old->uid) &&
!uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid))
goto error;
@@ -673,7 +673,7 @@ SYSCALL_DEFINE3(setresgid, gid_t, rgid, gid_t, egid, gid_t, sgid)
old = current_cred();

retval = -EPERM;
- if (!nsown_capable(CAP_SETGID)) {
+ if (!ns_capable(old->user_ns, CAP_SETGID)) {
if (rgid != (gid_t) -1 && !gid_eq(krgid, old->gid) &&
!gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid))
goto error;
@@ -744,7 +744,7 @@ SYSCALL_DEFINE1(setfsuid, uid_t, uid)

if (uid_eq(kuid, old->uid) || uid_eq(kuid, old->euid) ||
uid_eq(kuid, old->suid) || uid_eq(kuid, old->fsuid) ||
- nsown_capable(CAP_SETUID)) {
+ ns_capable(old->user_ns, CAP_SETUID)) {
if (!uid_eq(kuid, old->fsuid)) {
new->fsuid = kuid;
if (security_task_fix_setuid(new, old, LSM_SETID_FS) == 0)
@@ -783,7 +783,7 @@ SYSCALL_DEFINE1(setfsgid, gid_t, gid)

if (gid_eq(kgid, old->gid) || gid_eq(kgid, old->egid) ||
gid_eq(kgid, old->sgid) || gid_eq(kgid, old->fsgid) ||
- nsown_capable(CAP_SETGID)) {
+ ns_capable(old->user_ns, CAP_SETGID)) {
if (!gid_eq(kgid, old->fsgid)) {
new->fsgid = kgid;
goto change_okay;
diff --git a/kernel/uid16.c b/kernel/uid16.c
index f6c83d7..602e5bb 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -176,7 +176,7 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
struct group_info *group_info;
int retval;

- if (!nsown_capable(CAP_SETGID))
+ if (!ns_capable(current_user_ns(), CAP_SETGID))
return -EPERM;
if ((unsigned)gidsetsize > NGROUPS_MAX)
return -EINVAL;
diff --git a/kernel/utsname.c b/kernel/utsname.c
index 2fc8576..fd39312 100644
--- a/kernel/utsname.c
+++ b/kernel/utsname.c
@@ -114,7 +114,7 @@ static int utsns_install(struct nsproxy *nsproxy, void *new)
struct uts_namespace *ns = new;

if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
- !nsown_capable(CAP_SYS_ADMIN))
+ !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
return -EPERM;

get_uts_ns(ns);
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index f9765203..81d3a9a 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -651,7 +651,7 @@ static int netns_install(struct nsproxy *nsproxy, void *ns)
struct net *net = ns;

if (!ns_capable(net->user_ns, CAP_SYS_ADMIN) ||
- !nsown_capable(CAP_SYS_ADMIN))
+ !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
return -EPERM;

put_net(nsproxy->net_ns);
diff --git a/net/core/scm.c b/net/core/scm.c
index 03795d0..c346f58 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -56,9 +56,9 @@ static __inline__ int scm_check_creds(struct ucred *creds)
if ((creds->pid == task_tgid_vnr(current) ||
ns_capable(current->nsproxy->pid_ns->user_ns, CAP_SYS_ADMIN)) &&
((uid_eq(uid, cred->uid) || uid_eq(uid, cred->euid) ||
- uid_eq(uid, cred->suid)) || nsown_capable(CAP_SETUID)) &&
+ uid_eq(uid, cred->suid)) || ns_capable(cred->user_ns, CAP_SETUID)) &&
((gid_eq(gid, cred->gid) || gid_eq(gid, cred->egid) ||
- gid_eq(gid, cred->sgid)) || nsown_capable(CAP_SETGID))) {
+ gid_eq(gid, cred->sgid)) || ns_capable(cred->user_ns, CAP_SETGID))) {
return 0;
}
return -EPERM;
--
1.7.5.4

2013-08-30 01:14:16

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [REVIEW][PATCH 5/5] userns: Kill nsown_capable it makes the wrong thing easy

Quoting Eric W. Biederman ([email protected]):
>
> nsown_capable is a special case of ns_capable essentially for just CAP_SETUID and
> CAP_SETGID. For the existing users it doesn't noticably simplify things and
> from the suggested patches I have seen it encourages people to do the wrong
> thing. So remove nsown_capable.
>
> Signed-off-by: "Eric W. Biederman" <[email protected]>

Yeah I've had the same thought before. You rarely want nsown_capable, and
it wants to be mis-used.

Acked-by: Serge Hallyn <[email protected]>

> ---
> fs/namespace.c | 4 ++--
> fs/open.c | 2 +-
> include/linux/capability.h | 1 -
> ipc/namespace.c | 2 +-
> kernel/capability.c | 12 ------------
> kernel/groups.c | 2 +-
> kernel/pid_namespace.c | 2 +-
> kernel/sys.c | 20 ++++++++++----------
> kernel/uid16.c | 2 +-
> kernel/utsname.c | 2 +-
> net/core/net_namespace.c | 2 +-
> net/core/scm.c | 4 ++--
> 12 files changed, 21 insertions(+), 34 deletions(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 877e427..dc519a1 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2929,8 +2929,8 @@ static int mntns_install(struct nsproxy *nsproxy, void *ns)
> struct path root;
>
> if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) ||
> - !nsown_capable(CAP_SYS_CHROOT) ||
> - !nsown_capable(CAP_SYS_ADMIN))
> + !ns_capable(current_user_ns(), CAP_SYS_CHROOT) ||
> + !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
> return -EPERM;
>
> if (fs->users != 1)
> diff --git a/fs/open.c b/fs/open.c
> index 9156cb0..2a57580 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -443,7 +443,7 @@ retry:
> goto dput_and_out;
>
> error = -EPERM;
> - if (!nsown_capable(CAP_SYS_CHROOT))
> + if (!ns_capable(current_user_ns(), CAP_SYS_CHROOT))
> goto dput_and_out;
> error = security_path_chroot(&path);
> if (error)
> diff --git a/include/linux/capability.h b/include/linux/capability.h
> index d9a4f7f..a6ee1f9 100644
> --- a/include/linux/capability.h
> +++ b/include/linux/capability.h
> @@ -210,7 +210,6 @@ extern bool has_ns_capability_noaudit(struct task_struct *t,
> struct user_namespace *ns, int cap);
> extern bool capable(int cap);
> extern bool ns_capable(struct user_namespace *ns, int cap);
> -extern bool nsown_capable(int cap);
> extern bool inode_capable(const struct inode *inode, int cap);
> extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
>
> diff --git a/ipc/namespace.c b/ipc/namespace.c
> index 7ee61bf..4be6581 100644
> --- a/ipc/namespace.c
> +++ b/ipc/namespace.c
> @@ -171,7 +171,7 @@ static int ipcns_install(struct nsproxy *nsproxy, void *new)
> {
> struct ipc_namespace *ns = new;
> if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
> - !nsown_capable(CAP_SYS_ADMIN))
> + !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
> return -EPERM;
>
> /* Ditch state from the old ipc namespace */
> diff --git a/kernel/capability.c b/kernel/capability.c
> index f6c2ce5..6fc1c8a 100644
> --- a/kernel/capability.c
> +++ b/kernel/capability.c
> @@ -433,18 +433,6 @@ bool capable(int cap)
> EXPORT_SYMBOL(capable);
>
> /**
> - * nsown_capable - Check superior capability to one's own user_ns
> - * @cap: The capability in question
> - *
> - * Return true if the current task has the given superior capability
> - * targeted at its own user namespace.
> - */
> -bool nsown_capable(int cap)
> -{
> - return ns_capable(current_user_ns(), cap);
> -}
> -
> -/**
> * inode_capable - Check superior capability over inode
> * @inode: The inode in question
> * @cap: The capability in question
> diff --git a/kernel/groups.c b/kernel/groups.c
> index 6b2588d..90cf1c3 100644
> --- a/kernel/groups.c
> +++ b/kernel/groups.c
> @@ -233,7 +233,7 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
> struct group_info *group_info;
> int retval;
>
> - if (!nsown_capable(CAP_SETGID))
> + if (!ns_capable(current_user_ns(), CAP_SETGID))
> return -EPERM;
> if ((unsigned)gidsetsize > NGROUPS_MAX)
> return -EINVAL;
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index 6917e8e..ee1f6bb 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -329,7 +329,7 @@ static int pidns_install(struct nsproxy *nsproxy, void *ns)
> struct pid_namespace *ancestor, *new = ns;
>
> if (!ns_capable(new->user_ns, CAP_SYS_ADMIN) ||
> - !nsown_capable(CAP_SYS_ADMIN))
> + !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
> return -EPERM;
>
> /*
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 771129b..c18ecca 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -337,7 +337,7 @@ SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
> if (rgid != (gid_t) -1) {
> if (gid_eq(old->gid, krgid) ||
> gid_eq(old->egid, krgid) ||
> - nsown_capable(CAP_SETGID))
> + ns_capable(old->user_ns, CAP_SETGID))
> new->gid = krgid;
> else
> goto error;
> @@ -346,7 +346,7 @@ SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
> if (gid_eq(old->gid, kegid) ||
> gid_eq(old->egid, kegid) ||
> gid_eq(old->sgid, kegid) ||
> - nsown_capable(CAP_SETGID))
> + ns_capable(old->user_ns, CAP_SETGID))
> new->egid = kegid;
> else
> goto error;
> @@ -387,7 +387,7 @@ SYSCALL_DEFINE1(setgid, gid_t, gid)
> old = current_cred();
>
> retval = -EPERM;
> - if (nsown_capable(CAP_SETGID))
> + if (ns_capable(old->user_ns, CAP_SETGID))
> new->gid = new->egid = new->sgid = new->fsgid = kgid;
> else if (gid_eq(kgid, old->gid) || gid_eq(kgid, old->sgid))
> new->egid = new->fsgid = kgid;
> @@ -471,7 +471,7 @@ SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
> new->uid = kruid;
> if (!uid_eq(old->uid, kruid) &&
> !uid_eq(old->euid, kruid) &&
> - !nsown_capable(CAP_SETUID))
> + !ns_capable(old->user_ns, CAP_SETUID))
> goto error;
> }
>
> @@ -480,7 +480,7 @@ SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
> if (!uid_eq(old->uid, keuid) &&
> !uid_eq(old->euid, keuid) &&
> !uid_eq(old->suid, keuid) &&
> - !nsown_capable(CAP_SETUID))
> + !ns_capable(old->user_ns, CAP_SETUID))
> goto error;
> }
>
> @@ -534,7 +534,7 @@ SYSCALL_DEFINE1(setuid, uid_t, uid)
> old = current_cred();
>
> retval = -EPERM;
> - if (nsown_capable(CAP_SETUID)) {
> + if (ns_capable(old->user_ns, CAP_SETUID)) {
> new->suid = new->uid = kuid;
> if (!uid_eq(kuid, old->uid)) {
> retval = set_user(new);
> @@ -591,7 +591,7 @@ SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid)
> old = current_cred();
>
> retval = -EPERM;
> - if (!nsown_capable(CAP_SETUID)) {
> + if (!ns_capable(old->user_ns, CAP_SETUID)) {
> if (ruid != (uid_t) -1 && !uid_eq(kruid, old->uid) &&
> !uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid))
> goto error;
> @@ -673,7 +673,7 @@ SYSCALL_DEFINE3(setresgid, gid_t, rgid, gid_t, egid, gid_t, sgid)
> old = current_cred();
>
> retval = -EPERM;
> - if (!nsown_capable(CAP_SETGID)) {
> + if (!ns_capable(old->user_ns, CAP_SETGID)) {
> if (rgid != (gid_t) -1 && !gid_eq(krgid, old->gid) &&
> !gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid))
> goto error;
> @@ -744,7 +744,7 @@ SYSCALL_DEFINE1(setfsuid, uid_t, uid)
>
> if (uid_eq(kuid, old->uid) || uid_eq(kuid, old->euid) ||
> uid_eq(kuid, old->suid) || uid_eq(kuid, old->fsuid) ||
> - nsown_capable(CAP_SETUID)) {
> + ns_capable(old->user_ns, CAP_SETUID)) {
> if (!uid_eq(kuid, old->fsuid)) {
> new->fsuid = kuid;
> if (security_task_fix_setuid(new, old, LSM_SETID_FS) == 0)
> @@ -783,7 +783,7 @@ SYSCALL_DEFINE1(setfsgid, gid_t, gid)
>
> if (gid_eq(kgid, old->gid) || gid_eq(kgid, old->egid) ||
> gid_eq(kgid, old->sgid) || gid_eq(kgid, old->fsgid) ||
> - nsown_capable(CAP_SETGID)) {
> + ns_capable(old->user_ns, CAP_SETGID)) {
> if (!gid_eq(kgid, old->fsgid)) {
> new->fsgid = kgid;
> goto change_okay;
> diff --git a/kernel/uid16.c b/kernel/uid16.c
> index f6c83d7..602e5bb 100644
> --- a/kernel/uid16.c
> +++ b/kernel/uid16.c
> @@ -176,7 +176,7 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
> struct group_info *group_info;
> int retval;
>
> - if (!nsown_capable(CAP_SETGID))
> + if (!ns_capable(current_user_ns(), CAP_SETGID))
> return -EPERM;
> if ((unsigned)gidsetsize > NGROUPS_MAX)
> return -EINVAL;
> diff --git a/kernel/utsname.c b/kernel/utsname.c
> index 2fc8576..fd39312 100644
> --- a/kernel/utsname.c
> +++ b/kernel/utsname.c
> @@ -114,7 +114,7 @@ static int utsns_install(struct nsproxy *nsproxy, void *new)
> struct uts_namespace *ns = new;
>
> if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
> - !nsown_capable(CAP_SYS_ADMIN))
> + !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
> return -EPERM;
>
> get_uts_ns(ns);
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index f9765203..81d3a9a 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -651,7 +651,7 @@ static int netns_install(struct nsproxy *nsproxy, void *ns)
> struct net *net = ns;
>
> if (!ns_capable(net->user_ns, CAP_SYS_ADMIN) ||
> - !nsown_capable(CAP_SYS_ADMIN))
> + !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
> return -EPERM;
>
> put_net(nsproxy->net_ns);
> diff --git a/net/core/scm.c b/net/core/scm.c
> index 03795d0..c346f58 100644
> --- a/net/core/scm.c
> +++ b/net/core/scm.c
> @@ -56,9 +56,9 @@ static __inline__ int scm_check_creds(struct ucred *creds)
> if ((creds->pid == task_tgid_vnr(current) ||
> ns_capable(current->nsproxy->pid_ns->user_ns, CAP_SYS_ADMIN)) &&
> ((uid_eq(uid, cred->uid) || uid_eq(uid, cred->euid) ||
> - uid_eq(uid, cred->suid)) || nsown_capable(CAP_SETUID)) &&
> + uid_eq(uid, cred->suid)) || ns_capable(cred->user_ns, CAP_SETUID)) &&
> ((gid_eq(gid, cred->gid) || gid_eq(gid, cred->egid) ||
> - gid_eq(gid, cred->sgid)) || nsown_capable(CAP_SETGID))) {
> + gid_eq(gid, cred->sgid)) || ns_capable(cred->user_ns, CAP_SETGID))) {
> return 0;
> }
> return -EPERM;
> --
> 1.7.5.4

2013-08-30 01:15:16

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [REVIEW][PATCH 2/5] userns: Allow PR_CAPBSET_DROP in a user namespace.

Quoting Eric W. Biederman ([email protected]):
>
> As the capabilites and capability bounding set are per user namespace
> properties it is safe to allow changing them with just CAP_SETPCAP
> permission in the user namespace.
>
> Signed-off-by: "Eric W. Biederman" <[email protected]>
> Tested-by: Richard Weinberger <[email protected]>

Acked-by: Serge Hallyn <[email protected]>

> ---
> security/commoncap.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/security/commoncap.c b/security/commoncap.c
> index c44b6fe..9fccf71 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -824,7 +824,7 @@ int cap_task_setnice(struct task_struct *p, int nice)
> */
> static long cap_prctl_drop(struct cred *new, unsigned long cap)
> {
> - if (!capable(CAP_SETPCAP))
> + if (!ns_capable(current_user_ns(), CAP_SETPCAP))
> return -EPERM;
> if (!cap_valid(cap))
> return -EINVAL;
> --
> 1.7.5.4

2013-08-30 16:10:09

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [REVIEW][PATCH 1/5] namespaces: Simplify copy_namespaces so it is clear what is going on.

Quoting Eric W. Biederman ([email protected]):
>
> Remove the test for the impossible case where tsk->nsproxy == NULL. Fork
> will never be called with tsk->nsproxy == NULL.
>
> Only call get_nsproxy when we don't need to generate a new_nsproxy,
> and mark the case where we don't generate a new nsproxy as likely.
>
> Remove the code to drop an unnecessarily acquired nsproxy value.
>
> Signed-off-by: "Eric W. Biederman" <[email protected]>

Acked-by: Serge Hallyn <[email protected]>

> ---
> kernel/nsproxy.c | 35 +++++++++++------------------------
> 1 files changed, 11 insertions(+), 24 deletions(-)
>
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index d9afd2563..a1ed011 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -125,22 +125,16 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
> struct nsproxy *old_ns = tsk->nsproxy;
> struct user_namespace *user_ns = task_cred_xxx(tsk, user_ns);
> struct nsproxy *new_ns;
> - int err = 0;
>
> - if (!old_ns)
> + if (likely(!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
> + CLONE_NEWPID | CLONE_NEWNET)))) {
> + get_nsproxy(old_ns);
> return 0;
> -
> - get_nsproxy(old_ns);
> -
> - if (!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
> - CLONE_NEWPID | CLONE_NEWNET)))
> - return 0;
> -
> - if (!ns_capable(user_ns, CAP_SYS_ADMIN)) {
> - err = -EPERM;
> - goto out;
> }
>
> + if (!ns_capable(user_ns, CAP_SYS_ADMIN))
> + return -EPERM;
> +
> /*
> * CLONE_NEWIPC must detach from the undolist: after switching
> * to a new ipc namespace, the semaphore arrays from the old
> @@ -149,22 +143,15 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
> * it along with CLONE_NEWIPC.
> */
> if ((flags & (CLONE_NEWIPC | CLONE_SYSVSEM)) ==
> - (CLONE_NEWIPC | CLONE_SYSVSEM)) {
> - err = -EINVAL;
> - goto out;
> - }
> + (CLONE_NEWIPC | CLONE_SYSVSEM))
> + return -EINVAL;
>
> new_ns = create_new_namespaces(flags, tsk, user_ns, tsk->fs);
> - if (IS_ERR(new_ns)) {
> - err = PTR_ERR(new_ns);
> - goto out;
> - }
> + if (IS_ERR(new_ns))
> + return PTR_ERR(new_ns);
>
> tsk->nsproxy = new_ns;
> -
> -out:
> - put_nsproxy(old_ns);
> - return err;
> + return 0;
> }
>
> void free_nsproxy(struct nsproxy *ns)
> --
> 1.7.5.4

2013-08-30 16:38:07

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [REVIEW][PATCH 3/5] pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD

Quoting Eric W. Biederman ([email protected]):
>
> I goofed when I made unshare(CLONE_NEWPID) only work in a
> single-threaded process. There is no need for that requirement and in
> fact I analyzied things right for setns. The hard requirement
> is for tasks that share a VM to all be in the pid namespace and
> we properly prevent that in do_fork.

I don't understand though - copy_process does have the right test:

1176 * If the new process will be in a different pid namespace
1177 * don't allow the creation of threads.
1178 */
1179 if ((clone_flags & (CLONE_VM|CLONE_NEWPID)) &&
1180 (task_active_pid_ns(current) != current->nsproxy->pid_ns))
1181 return ERR_PTR(-EINVAL);

but why is it ok for sys_unshare not to do that? Note that
in order for check_unshare_flags() to bail on &current->mm->mm_users > 1
you do have to set CLONE_VM (for inverse interpretation).

So it seems to me this isn't safe as is, and we need to at least
set CLONE_VM if CLONE_PID is set.

> Just to be certain I took a look through do_wait and
> forget_original_parent and there are no cases that make it any harder
> for children to be in the multiple pid namespaces than it is for
> children to be in the same pid namespace. I also performed a check to
> see if there were in uses of task->nsproxy_pid_ns I was not familiar
> with, but it is only used when allocating a new pid for a new task,
> and in checks to prevent craziness from happening.
>
> Signed-off-by: "Eric W. Biederman" <[email protected]>
> ---
> kernel/fork.c | 5 -----
> 1 files changed, 0 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 66635c8..eb45f1d 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1818,11 +1818,6 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
> if (unshare_flags & CLONE_NEWUSER)
> unshare_flags |= CLONE_THREAD | CLONE_FS;
> /*
> - * If unsharing a pid namespace must also unshare the thread.
> - */
> - if (unshare_flags & CLONE_NEWPID)
> - unshare_flags |= CLONE_THREAD;
> - /*
> * If unsharing a thread from a thread group, must also unshare vm.
> */
> if (unshare_flags & CLONE_THREAD)
> --
> 1.7.5.4

2013-08-30 23:49:52

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [REVIEW][PATCH 3/5] pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD

"Serge E. Hallyn" <[email protected]> writes:

> Quoting Eric W. Biederman ([email protected]):
>>
>> I goofed when I made unshare(CLONE_NEWPID) only work in a
>> single-threaded process. There is no need for that requirement and in
>> fact I analyzied things right for setns. The hard requirement
>> is for tasks that share a VM to all be in the pid namespace and
>> we properly prevent that in do_fork.
>
> I don't understand though - copy_process does have the right test:
>
> 1176 * If the new process will be in a different pid namespace
> 1177 * don't allow the creation of threads.
> 1178 */
> 1179 if ((clone_flags & (CLONE_VM|CLONE_NEWPID)) &&
> 1180 (task_active_pid_ns(current) != current->nsproxy->pid_ns))
> 1181 return ERR_PTR(-EINVAL);
>
> but why is it ok for sys_unshare not to do that? Note that
> in order for check_unshare_flags() to bail on &current->mm->mm_users > 1
> you do have to set CLONE_VM (for inverse interpretation).
>
> So it seems to me this isn't safe as is, and we need to at least
> set CLONE_VM if CLONE_PID is set.

Partly this is the difference in the meaning of the flags between
unshare and clone.

Basically in unshare all othat gets changed is
current->nsproxy->pid_ns_for_children (the rename is in the net tree).

So because unshare of the pid namespace does not actually effect the
current processes, just the pid namespace the children of the current
thread will be in this is safe.

And frankly having the checks be obviously different is a good thing
because it means that people will ask why in the world this is so and
realize the difference in meaning.

Eric

2013-08-31 05:31:30

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [REVIEW][PATCH 3/5] pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD

Quoting Eric W. Biederman ([email protected]):
> "Serge E. Hallyn" <[email protected]> writes:
>
> > Quoting Eric W. Biederman ([email protected]):
> >>
> >> I goofed when I made unshare(CLONE_NEWPID) only work in a
> >> single-threaded process. There is no need for that requirement and in
> >> fact I analyzied things right for setns. The hard requirement
> >> is for tasks that share a VM to all be in the pid namespace and
> >> we properly prevent that in do_fork.
> >
> > I don't understand though - copy_process does have the right test:
> >
> > 1176 * If the new process will be in a different pid namespace
> > 1177 * don't allow the creation of threads.
> > 1178 */
> > 1179 if ((clone_flags & (CLONE_VM|CLONE_NEWPID)) &&
> > 1180 (task_active_pid_ns(current) != current->nsproxy->pid_ns))
> > 1181 return ERR_PTR(-EINVAL);
> >
> > but why is it ok for sys_unshare not to do that? Note that
> > in order for check_unshare_flags() to bail on &current->mm->mm_users > 1
> > you do have to set CLONE_VM (for inverse interpretation).
> >
> > So it seems to me this isn't safe as is, and we need to at least
> > set CLONE_VM if CLONE_PID is set.
>
> Partly this is the difference in the meaning of the flags between
> unshare and clone.
>
> Basically in unshare all othat gets changed is
> current->nsproxy->pid_ns_for_children (the rename is in the net tree).

D'oh, right. Thanks!

Acked-by: Serge Hallyn <[email protected]>


> So because unshare of the pid namespace does not actually effect the
> current processes, just the pid namespace the children of the current
> thread will be in this is safe.
>
> And frankly having the checks be obviously different is a good thing
> because it means that people will ask why in the world this is so and
> realize the difference in meaning.
>
> Eric

2013-09-08 17:06:12

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [REVIEW][PATCH 3/5] pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD

Sorry for delay, vacation.

On 08/29, Eric W. Biederman wrote:
>
> I goofed when I made unshare(CLONE_NEWPID) only work in a
> single-threaded process. There is no need for that requirement and in
> fact I analyzied things right for setns. The hard requirement
> is for tasks that share a VM to all be in the pid namespace and
> we properly prevent that in do_fork.

Yes, agreed, with the current meaning of ->pid_ns unshare(NEWPID)
looks safe even if the caller is multi-threaded... and this matches
pidns_install() which doesn't require single-threaded.

Oleg.