2015-12-12 20:22:27

by Jann Horn

[permalink] [raw]
Subject: [PATCH] ptrace: being capable wrt a process requires mapped uids/gids

ptrace_has_cap() checks whether the current process should be
treated as having a certain capability for ptrace checks
against another process. Until now, this was equivalent to
has_ns_capability(current, target_ns, CAP_SYS_PTRACE).

However, if a root-owned process wants to enter a user
namespace for some reason without knowing who owns it and
therefore can't change to the namespace owner's uid and gid
before entering, as soon as it has entered the namespace,
the namespace owner can attach to it via ptrace and thereby
gain access to its uid and gid.

While it is possible for the entering process to switch to
the uid of a claimed namespace owner before entering,
causing the attempt to enter to fail if the claimed uid is
wrong, this doesn't solve the problem of determining an
appropriate gid.

With this change, the entering process can first enter the
namespace and then safely inspect the namespace's
properties, e.g. through /proc/self/{uid_map,gid_map},
assuming that the namespace owner doesn't have access to
uid 0.

Signed-off-by: Jann Horn <[email protected]>
---
kernel/ptrace.c | 30 +++++++++++++++++++++++++-----
1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index b760bae..c27770d 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -207,12 +207,32 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
return ret;
}

-static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
+static bool ptrace_has_cap(const struct cred *tcred, unsigned int mode)
{
+ struct user_namespace *tns = tcred->user_ns;
+ struct user_namespace *curns = current_cred()->user_ns;
+
+ /* When a root-owned process enters a user namespace created by a
+ * malicious user, the user shouldn't be able to execute code under
+ * uid 0 by attaching to the root-owned process via ptrace.
+ * Therefore, similar to the capable_wrt_inode_uidgid() check,
+ * verify that all the uids and gids of the target process are
+ * mapped into the current namespace.
+ * No fsuid/fsgid check because __ptrace_may_access doesn't do it
+ * either.
+ */
+ if (!kuid_has_mapping(curns, tcred->euid) ||
+ !kuid_has_mapping(curns, tcred->suid) ||
+ !kuid_has_mapping(curns, tcred->uid) ||
+ !kgid_has_mapping(curns, tcred->egid) ||
+ !kgid_has_mapping(curns, tcred->sgid) ||
+ !kgid_has_mapping(curns, tcred->gid))
+ return false;
+
if (mode & PTRACE_MODE_NOAUDIT)
- return has_ns_capability_noaudit(current, ns, CAP_SYS_PTRACE);
+ return has_ns_capability_noaudit(current, tns, CAP_SYS_PTRACE);
else
- return has_ns_capability(current, ns, CAP_SYS_PTRACE);
+ return has_ns_capability(current, tns, CAP_SYS_PTRACE);
}

/* Returns 0 on success, -errno on denial. */
@@ -241,7 +261,7 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
gid_eq(cred->gid, tcred->sgid) &&
gid_eq(cred->gid, tcred->gid))
goto ok;
- if (ptrace_has_cap(tcred->user_ns, mode))
+ if (ptrace_has_cap(tcred, mode))
goto ok;
rcu_read_unlock();
return -EPERM;
@@ -252,7 +272,7 @@ ok:
dumpable = get_dumpable(task->mm);
rcu_read_lock();
if (dumpable != SUID_DUMP_USER &&
- !ptrace_has_cap(__task_cred(task)->user_ns, mode)) {
+ !ptrace_has_cap(__task_cred(task), mode)) {
rcu_read_unlock();
return -EPERM;
}
--
2.1.4


2015-12-15 00:26:35

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] ptrace: being capable wrt a process requires mapped uids/gids

On Sat, Dec 12, 2015 at 12:12 PM, Jann Horn <[email protected]> wrote:
> ptrace_has_cap() checks whether the current process should be
> treated as having a certain capability for ptrace checks
> against another process. Until now, this was equivalent to
> has_ns_capability(current, target_ns, CAP_SYS_PTRACE).
>
> However, if a root-owned process wants to enter a user
> namespace for some reason without knowing who owns it and
> therefore can't change to the namespace owner's uid and gid
> before entering, as soon as it has entered the namespace,
> the namespace owner can attach to it via ptrace and thereby
> gain access to its uid and gid.
>
> While it is possible for the entering process to switch to
> the uid of a claimed namespace owner before entering,
> causing the attempt to enter to fail if the claimed uid is
> wrong, this doesn't solve the problem of determining an
> appropriate gid.
>
> With this change, the entering process can first enter the
> namespace and then safely inspect the namespace's
> properties, e.g. through /proc/self/{uid_map,gid_map},
> assuming that the namespace owner doesn't have access to
> uid 0.
>
> Signed-off-by: Jann Horn <[email protected]>

The analysis and the patch both look correct.

Reviewed-by: Andy Lutomirski <[email protected]>

Eric, care to opine?

--Andy

> ---
> kernel/ptrace.c | 30 +++++++++++++++++++++++++-----
> 1 file changed, 25 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index b760bae..c27770d 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -207,12 +207,32 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
> return ret;
> }
>
> -static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
> +static bool ptrace_has_cap(const struct cred *tcred, unsigned int mode)
> {
> + struct user_namespace *tns = tcred->user_ns;
> + struct user_namespace *curns = current_cred()->user_ns;
> +
> + /* When a root-owned process enters a user namespace created by a
> + * malicious user, the user shouldn't be able to execute code under
> + * uid 0 by attaching to the root-owned process via ptrace.
> + * Therefore, similar to the capable_wrt_inode_uidgid() check,
> + * verify that all the uids and gids of the target process are
> + * mapped into the current namespace.
> + * No fsuid/fsgid check because __ptrace_may_access doesn't do it
> + * either.
> + */
> + if (!kuid_has_mapping(curns, tcred->euid) ||
> + !kuid_has_mapping(curns, tcred->suid) ||
> + !kuid_has_mapping(curns, tcred->uid) ||
> + !kgid_has_mapping(curns, tcred->egid) ||
> + !kgid_has_mapping(curns, tcred->sgid) ||
> + !kgid_has_mapping(curns, tcred->gid))
> + return false;
> +
> if (mode & PTRACE_MODE_NOAUDIT)
> - return has_ns_capability_noaudit(current, ns, CAP_SYS_PTRACE);
> + return has_ns_capability_noaudit(current, tns, CAP_SYS_PTRACE);
> else
> - return has_ns_capability(current, ns, CAP_SYS_PTRACE);
> + return has_ns_capability(current, tns, CAP_SYS_PTRACE);
> }
>
> /* Returns 0 on success, -errno on denial. */
> @@ -241,7 +261,7 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
> gid_eq(cred->gid, tcred->sgid) &&
> gid_eq(cred->gid, tcred->gid))
> goto ok;
> - if (ptrace_has_cap(tcred->user_ns, mode))
> + if (ptrace_has_cap(tcred, mode))
> goto ok;
> rcu_read_unlock();
> return -EPERM;
> @@ -252,7 +272,7 @@ ok:
> dumpable = get_dumpable(task->mm);
> rcu_read_lock();
> if (dumpable != SUID_DUMP_USER &&
> - !ptrace_has_cap(__task_cred(task)->user_ns, mode)) {
> + !ptrace_has_cap(__task_cred(task), mode)) {
> rcu_read_unlock();
> return -EPERM;
> }
> --
> 2.1.4
>



--
Andy Lutomirski
AMA Capital Management, LLC

2015-12-17 18:59:58

by Serge Hallyn

[permalink] [raw]
Subject: Re: [PATCH] ptrace: being capable wrt a process requires mapped uids/gids

On Sat, Dec 12, 2015 at 09:12:41PM +0100, Jann Horn wrote:
> ptrace_has_cap() checks whether the current process should be
> treated as having a certain capability for ptrace checks
> against another process. Until now, this was equivalent to
> has_ns_capability(current, target_ns, CAP_SYS_PTRACE).
>
> However, if a root-owned process wants to enter a user
> namespace for some reason without knowing who owns it and
> therefore can't change to the namespace owner's uid and gid
> before entering, as soon as it has entered the namespace,
> the namespace owner can attach to it via ptrace and thereby
> gain access to its uid and gid.
>
> While it is possible for the entering process to switch to
> the uid of a claimed namespace owner before entering,
> causing the attempt to enter to fail if the claimed uid is
> wrong, this doesn't solve the problem of determining an
> appropriate gid.
>
> With this change, the entering process can first enter the
> namespace and then safely inspect the namespace's
> properties, e.g. through /proc/self/{uid_map,gid_map},
> assuming that the namespace owner doesn't have access to
> uid 0.
>
> Signed-off-by: Jann Horn <[email protected]>

Thanks, Jann! This is indeed pretty high priority.

Acked-by: Serge Hallyn <[email protected]>

> ---
> kernel/ptrace.c | 30 +++++++++++++++++++++++++-----
> 1 file changed, 25 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index b760bae..c27770d 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -207,12 +207,32 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
> return ret;
> }
>
> -static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
> +static bool ptrace_has_cap(const struct cred *tcred, unsigned int mode)
> {
> + struct user_namespace *tns = tcred->user_ns;
> + struct user_namespace *curns = current_cred()->user_ns;
> +
> + /* When a root-owned process enters a user namespace created by a
> + * malicious user, the user shouldn't be able to execute code under
> + * uid 0 by attaching to the root-owned process via ptrace.
> + * Therefore, similar to the capable_wrt_inode_uidgid() check,
> + * verify that all the uids and gids of the target process are
> + * mapped into the current namespace.
> + * No fsuid/fsgid check because __ptrace_may_access doesn't do it
> + * either.
> + */
> + if (!kuid_has_mapping(curns, tcred->euid) ||
> + !kuid_has_mapping(curns, tcred->suid) ||
> + !kuid_has_mapping(curns, tcred->uid) ||
> + !kgid_has_mapping(curns, tcred->egid) ||
> + !kgid_has_mapping(curns, tcred->sgid) ||
> + !kgid_has_mapping(curns, tcred->gid))
> + return false;
> +
> if (mode & PTRACE_MODE_NOAUDIT)
> - return has_ns_capability_noaudit(current, ns, CAP_SYS_PTRACE);
> + return has_ns_capability_noaudit(current, tns, CAP_SYS_PTRACE);
> else
> - return has_ns_capability(current, ns, CAP_SYS_PTRACE);
> + return has_ns_capability(current, tns, CAP_SYS_PTRACE);
> }
>
> /* Returns 0 on success, -errno on denial. */
> @@ -241,7 +261,7 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
> gid_eq(cred->gid, tcred->sgid) &&
> gid_eq(cred->gid, tcred->gid))
> goto ok;
> - if (ptrace_has_cap(tcred->user_ns, mode))
> + if (ptrace_has_cap(tcred, mode))
> goto ok;
> rcu_read_unlock();
> return -EPERM;
> @@ -252,7 +272,7 @@ ok:
> dumpable = get_dumpable(task->mm);
> rcu_read_lock();
> if (dumpable != SUID_DUMP_USER &&
> - !ptrace_has_cap(__task_cred(task)->user_ns, mode)) {
> + !ptrace_has_cap(__task_cred(task), mode)) {
> rcu_read_unlock();
> return -EPERM;
> }
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2015-12-26 01:10:43

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH] ptrace: being capable wrt a process requires mapped uids/gids

On Sat, Dec 12, 2015 at 09:12:41PM +0100, Jann Horn wrote:
> With this change, the entering process can first enter the
> namespace and then safely inspect the namespace's
> properties, e.g. through /proc/self/{uid_map,gid_map},
> assuming that the namespace owner doesn't have access to
> uid 0.

Actually, I think I missed something there. Well, at least it
should not directly lead to a container escape.


> -static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
> +static bool ptrace_has_cap(const struct cred *tcred, unsigned int mode)
> {
> + struct user_namespace *tns = tcred->user_ns;
> + struct user_namespace *curns = current_cred()->user_ns;
> +
> + /* When a root-owned process enters a user namespace created by a
> + * malicious user, the user shouldn't be able to execute code under
> + * uid 0 by attaching to the root-owned process via ptrace.
> + * Therefore, similar to the capable_wrt_inode_uidgid() check,
> + * verify that all the uids and gids of the target process are
> + * mapped into the current namespace.
> + * No fsuid/fsgid check because __ptrace_may_access doesn't do it
> + * either.
> + */
> + if (!kuid_has_mapping(curns, tcred->euid) ||
> + !kuid_has_mapping(curns, tcred->suid) ||
> + !kuid_has_mapping(curns, tcred->uid) ||
> + !kgid_has_mapping(curns, tcred->egid) ||
> + !kgid_has_mapping(curns, tcred->sgid) ||
> + !kgid_has_mapping(curns, tcred->gid))
> + return false;
> +
> if (mode & PTRACE_MODE_NOAUDIT)
> - return has_ns_capability_noaudit(current, ns, CAP_SYS_PTRACE);
> + return has_ns_capability_noaudit(current, tns, CAP_SYS_PTRACE);
> else
> - return has_ns_capability(current, ns, CAP_SYS_PTRACE);
> + return has_ns_capability(current, tns, CAP_SYS_PTRACE);
> }

If the namespace owner can run code in the init namespace, the kuids are
mapped into curns but he is still capable wrt the target namespace.

I think a proper fix should first determine the highest parent of
tcred->user_ns in which the caller still has privs, then do the
kxid_has_mapping() checks in there.


Attachments:
(No filename) (2.03 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments

2015-12-26 02:52:51

by Jann Horn

[permalink] [raw]
Subject: [PATCH] ptrace: being capable wrt a process requires mapped uids/gids

ptrace_has_cap() checks whether the current process should be
treated as having a certain capability for ptrace checks
against another process. Until now, this was equivalent to
has_ns_capability(current, target_ns, CAP_SYS_PTRACE).

However, if a root-owned process wants to enter a user
namespace for some reason without knowing who owns it and
therefore can't change to the namespace owner's uid and gid
before entering, as soon as it has entered the namespace,
the namespace owner can attach to it via ptrace and thereby
gain access to its uid and gid.

While it is possible for the entering process to switch to
the uid of a claimed namespace owner before entering,
causing the attempt to enter to fail if the claimed uid is
wrong, this doesn't solve the problem of determining an
appropriate gid.

With this change, the entering process can first enter the
namespace and then safely inspect the namespace's
properties, e.g. through /proc/self/{uid_map,gid_map},
assuming that the namespace owner doesn't have access to
uid 0.

Changed in v2: The caller needs to be capable in the
namespace into which tcred's uids/gids can be mapped.

Signed-off-by: Jann Horn <[email protected]>
---
kernel/ptrace.c | 33 ++++++++++++++++++++++++++++-----
1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index b760bae..260a08d 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -20,6 +20,7 @@
#include <linux/uio.h>
#include <linux/audit.h>
#include <linux/pid_namespace.h>
+#include <linux/user_namespace.h>
#include <linux/syscalls.h>
#include <linux/uaccess.h>
#include <linux/regset.h>
@@ -207,12 +208,34 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
return ret;
}

-static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
+static bool ptrace_has_cap(const struct cred *tcred, unsigned int mode)
{
+ struct user_namespace *tns = tcred->user_ns;
+
+ /* When a root-owned process enters a user namespace created by a
+ * malicious user, the user shouldn't be able to execute code under
+ * uid 0 by attaching to the root-owned process via ptrace.
+ * Therefore, similar to the capable_wrt_inode_uidgid() check,
+ * verify that all the uids and gids of the target process are
+ * mapped into a namespace below the current one in which the caller
+ * is capable.
+ * No fsuid/fsgid check because __ptrace_may_access doesn't do it
+ * either.
+ */
+ while (
+ !kuid_has_mapping(tns, tcred->euid) ||
+ !kuid_has_mapping(tns, tcred->suid) ||
+ !kuid_has_mapping(tns, tcred->uid) ||
+ !kgid_has_mapping(tns, tcred->egid) ||
+ !kgid_has_mapping(tns, tcred->sgid) ||
+ !kgid_has_mapping(tns, tcred->gid)) {
+ tns = tns->parent;
+ }
+
if (mode & PTRACE_MODE_NOAUDIT)
- return has_ns_capability_noaudit(current, ns, CAP_SYS_PTRACE);
+ return has_ns_capability_noaudit(current, tns, CAP_SYS_PTRACE);
else
- return has_ns_capability(current, ns, CAP_SYS_PTRACE);
+ return has_ns_capability(current, tns, CAP_SYS_PTRACE);
}

/* Returns 0 on success, -errno on denial. */
@@ -241,7 +264,7 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
gid_eq(cred->gid, tcred->sgid) &&
gid_eq(cred->gid, tcred->gid))
goto ok;
- if (ptrace_has_cap(tcred->user_ns, mode))
+ if (ptrace_has_cap(tcred, mode))
goto ok;
rcu_read_unlock();
return -EPERM;
@@ -252,7 +275,7 @@ ok:
dumpable = get_dumpable(task->mm);
rcu_read_lock();
if (dumpable != SUID_DUMP_USER &&
- !ptrace_has_cap(__task_cred(task)->user_ns, mode)) {
+ !ptrace_has_cap(__task_cred(task), mode)) {
rcu_read_unlock();
return -EPERM;
}
--
2.1.4

2015-12-26 20:23:51

by Serge Hallyn

[permalink] [raw]
Subject: Re: [PATCH] ptrace: being capable wrt a process requires mapped uids/gids

On Sat, Dec 26, 2015 at 02:10:38AM +0100, Jann Horn wrote:
> On Sat, Dec 12, 2015 at 09:12:41PM +0100, Jann Horn wrote:
> > With this change, the entering process can first enter the
> > namespace and then safely inspect the namespace's
> > properties, e.g. through /proc/self/{uid_map,gid_map},
> > assuming that the namespace owner doesn't have access to
> > uid 0.
>
> Actually, I think I missed something there. Well, at least it
> should not directly lead to a container escape.
>
>
> > -static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
> > +static bool ptrace_has_cap(const struct cred *tcred, unsigned int mode)
> > {
> > + struct user_namespace *tns = tcred->user_ns;
> > + struct user_namespace *curns = current_cred()->user_ns;
> > +
> > + /* When a root-owned process enters a user namespace created by a
> > + * malicious user, the user shouldn't be able to execute code under
> > + * uid 0 by attaching to the root-owned process via ptrace.
> > + * Therefore, similar to the capable_wrt_inode_uidgid() check,
> > + * verify that all the uids and gids of the target process are
> > + * mapped into the current namespace.
> > + * No fsuid/fsgid check because __ptrace_may_access doesn't do it
> > + * either.
> > + */
> > + if (!kuid_has_mapping(curns, tcred->euid) ||
> > + !kuid_has_mapping(curns, tcred->suid) ||
> > + !kuid_has_mapping(curns, tcred->uid) ||
> > + !kgid_has_mapping(curns, tcred->egid) ||
> > + !kgid_has_mapping(curns, tcred->sgid) ||
> > + !kgid_has_mapping(curns, tcred->gid))
> > + return false;
> > +
> > if (mode & PTRACE_MODE_NOAUDIT)
> > - return has_ns_capability_noaudit(current, ns, CAP_SYS_PTRACE);
> > + return has_ns_capability_noaudit(current, tns, CAP_SYS_PTRACE);
> > else
> > - return has_ns_capability(current, ns, CAP_SYS_PTRACE);
> > + return has_ns_capability(current, tns, CAP_SYS_PTRACE);
> > }
>
> If the namespace owner can run code in the init namespace, the kuids are
> mapped into curns but he is still capable wrt the target namespace.
>
> I think a proper fix should first determine the highest parent of
> tcred->user_ns in which the caller still has privs, then do the
> kxid_has_mapping() checks in there.

Hi,

I don't quite follow what you are concerned about. Based on the new
patch you sent, I assume it's not the case where the tcred's kuid is
actually mapped into the container. So is it the case where I
unshare a userns which unshares a userns, then setns from the grandparent
into the child? And if so, the concern is that if the setns()ing task's
kuid is mappable all along into the grandhild, then container root should
be able to ptrace it?

thanks,
-serge

2015-12-26 20:55:54

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH] ptrace: being capable wrt a process requires mapped uids/gids

On Sat, Dec 26, 2015 at 02:23:45PM -0600, Serge E. Hallyn wrote:
> On Sat, Dec 26, 2015 at 02:10:38AM +0100, Jann Horn wrote:
> > On Sat, Dec 12, 2015 at 09:12:41PM +0100, Jann Horn wrote:
> > > With this change, the entering process can first enter the
> > > namespace and then safely inspect the namespace's
> > > properties, e.g. through /proc/self/{uid_map,gid_map},
> > > assuming that the namespace owner doesn't have access to
> > > uid 0.
> >
> > Actually, I think I missed something there. Well, at least it
> > should not directly lead to a container escape.
> >
> >
> > > -static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
> > > +static bool ptrace_has_cap(const struct cred *tcred, unsigned int mode)
> > > {
> > > + struct user_namespace *tns = tcred->user_ns;
> > > + struct user_namespace *curns = current_cred()->user_ns;
> > > +
> > > + /* When a root-owned process enters a user namespace created by a
> > > + * malicious user, the user shouldn't be able to execute code under
> > > + * uid 0 by attaching to the root-owned process via ptrace.
> > > + * Therefore, similar to the capable_wrt_inode_uidgid() check,
> > > + * verify that all the uids and gids of the target process are
> > > + * mapped into the current namespace.
> > > + * No fsuid/fsgid check because __ptrace_may_access doesn't do it
> > > + * either.
> > > + */
> > > + if (!kuid_has_mapping(curns, tcred->euid) ||
> > > + !kuid_has_mapping(curns, tcred->suid) ||
> > > + !kuid_has_mapping(curns, tcred->uid) ||
> > > + !kgid_has_mapping(curns, tcred->egid) ||
> > > + !kgid_has_mapping(curns, tcred->sgid) ||
> > > + !kgid_has_mapping(curns, tcred->gid))
> > > + return false;
> > > +
> > > if (mode & PTRACE_MODE_NOAUDIT)
> > > - return has_ns_capability_noaudit(current, ns, CAP_SYS_PTRACE);
> > > + return has_ns_capability_noaudit(current, tns, CAP_SYS_PTRACE);
> > > else
> > > - return has_ns_capability(current, ns, CAP_SYS_PTRACE);
> > > + return has_ns_capability(current, tns, CAP_SYS_PTRACE);
> > > }
> >
> > If the namespace owner can run code in the init namespace, the kuids are
> > mapped into curns but he is still capable wrt the target namespace.
> >
> > I think a proper fix should first determine the highest parent of
> > tcred->user_ns in which the caller still has privs, then do the
> > kxid_has_mapping() checks in there.
>
> Hi,
>
> I don't quite follow what you are concerned about. Based on the new
> patch you sent, I assume it's not the case where the tcred's kuid is
> actually mapped into the container. So is it the case where I
> unshare a userns which unshares a userns, then setns from the grandparent
> into the child? And if so, the concern is that if the setns()ing task's
> kuid is mappable all along into the grandhild, then container root should
> be able to ptrace it?

Consider the following scenario:

init_user_ns has a child namespace (I'll call it child_ns).
child_ns is owned by an attacker (child_ns->owner == attacker_kuid).
The attacking process has current_cred()->euid == attacker_kuid and lives
in init_user_ns (which means it's capable in child_ns).

The victim process (with euid==0) enters the namespace, then the attacking
process tries to attach to it. ptrace_has_cap(tcred, mode) would, with my
old patch, still be true because current is capable in tcred->user_ns and
all kuids are mapped into init_user_ns.

The new patch uses the following rule for cap checks:

The caller is capable relative to a target if a user namespace exists
for which all of the following conditions are true:

- The caller has the requested capability inside the namespace.
- The target is inside the namespace (either directly or in a child).
- The target's kuids and kgids are mapped into the namespace.

The first rule is enforced by the has_ns_capability(..., tns, ...) or
has_ns_capability_noaudit(..., tns, ...) call at the bottom.

The second rule is implicitly true because tns initially is the target's
user namespace and then moves up through ->parent.

The third rule is enforced by the while loop.


This prevents the attack I described, but e.g. still allows someone who
is capable in init_user_ns to ptrace anything, no matter in what weird
namespace the target is - if a task was ptrace-able for a process before
it called clone(CLONE_NEWUSER) / unshare(CLONE_NEWUSER) /
setns(, CLONE_NEWUSER), it should still be ptrace-able afterwards.
(Unless access was permitted based on the introspection rule.)


Attachments:
(No filename) (4.42 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments

2015-12-26 21:08:29

by Serge Hallyn

[permalink] [raw]
Subject: Re: [PATCH] ptrace: being capable wrt a process requires mapped uids/gids

On Sat, Dec 26, 2015 at 09:55:50PM +0100, Jann Horn wrote:
> On Sat, Dec 26, 2015 at 02:23:45PM -0600, Serge E. Hallyn wrote:
> > On Sat, Dec 26, 2015 at 02:10:38AM +0100, Jann Horn wrote:
> > > On Sat, Dec 12, 2015 at 09:12:41PM +0100, Jann Horn wrote:
> > > > With this change, the entering process can first enter the
> > > > namespace and then safely inspect the namespace's
> > > > properties, e.g. through /proc/self/{uid_map,gid_map},
> > > > assuming that the namespace owner doesn't have access to
> > > > uid 0.
> > >
> > > Actually, I think I missed something there. Well, at least it
> > > should not directly lead to a container escape.
> > >
> > >
> > > > -static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
> > > > +static bool ptrace_has_cap(const struct cred *tcred, unsigned int mode)
> > > > {
> > > > + struct user_namespace *tns = tcred->user_ns;
> > > > + struct user_namespace *curns = current_cred()->user_ns;
> > > > +
> > > > + /* When a root-owned process enters a user namespace created by a
> > > > + * malicious user, the user shouldn't be able to execute code under
> > > > + * uid 0 by attaching to the root-owned process via ptrace.
> > > > + * Therefore, similar to the capable_wrt_inode_uidgid() check,
> > > > + * verify that all the uids and gids of the target process are
> > > > + * mapped into the current namespace.
> > > > + * No fsuid/fsgid check because __ptrace_may_access doesn't do it
> > > > + * either.
> > > > + */
> > > > + if (!kuid_has_mapping(curns, tcred->euid) ||
> > > > + !kuid_has_mapping(curns, tcred->suid) ||
> > > > + !kuid_has_mapping(curns, tcred->uid) ||
> > > > + !kgid_has_mapping(curns, tcred->egid) ||
> > > > + !kgid_has_mapping(curns, tcred->sgid) ||
> > > > + !kgid_has_mapping(curns, tcred->gid))
> > > > + return false;
> > > > +
> > > > if (mode & PTRACE_MODE_NOAUDIT)
> > > > - return has_ns_capability_noaudit(current, ns, CAP_SYS_PTRACE);
> > > > + return has_ns_capability_noaudit(current, tns, CAP_SYS_PTRACE);
> > > > else
> > > > - return has_ns_capability(current, ns, CAP_SYS_PTRACE);
> > > > + return has_ns_capability(current, tns, CAP_SYS_PTRACE);
> > > > }
> > >
> > > If the namespace owner can run code in the init namespace, the kuids are
> > > mapped into curns but he is still capable wrt the target namespace.
> > >
> > > I think a proper fix should first determine the highest parent of
> > > tcred->user_ns in which the caller still has privs, then do the
> > > kxid_has_mapping() checks in there.
> >
> > Hi,
> >
> > I don't quite follow what you are concerned about. Based on the new
> > patch you sent, I assume it's not the case where the tcred's kuid is
> > actually mapped into the container. So is it the case where I
> > unshare a userns which unshares a userns, then setns from the grandparent
> > into the child? And if so, the concern is that if the setns()ing task's
> > kuid is mappable all along into the grandhild, then container root should
> > be able to ptrace it?
>
> Consider the following scenario:
>
> init_user_ns has a child namespace (I'll call it child_ns).
> child_ns is owned by an attacker (child_ns->owner == attacker_kuid).
> The attacking process has current_cred()->euid == attacker_kuid and lives
> in init_user_ns (which means it's capable in child_ns).

Ah, right. Special.

Thanks.

-serge

2015-12-26 21:17:33

by Serge Hallyn

[permalink] [raw]
Subject: Re: [PATCH] ptrace: being capable wrt a process requires mapped uids/gids

On Sat, Dec 26, 2015 at 03:52:31AM +0100, Jann Horn wrote:
> ptrace_has_cap() checks whether the current process should be
> treated as having a certain capability for ptrace checks
> against another process. Until now, this was equivalent to
> has_ns_capability(current, target_ns, CAP_SYS_PTRACE).
>
> However, if a root-owned process wants to enter a user
> namespace for some reason without knowing who owns it and
> therefore can't change to the namespace owner's uid and gid
> before entering, as soon as it has entered the namespace,
> the namespace owner can attach to it via ptrace and thereby
> gain access to its uid and gid.
>
> While it is possible for the entering process to switch to
> the uid of a claimed namespace owner before entering,
> causing the attempt to enter to fail if the claimed uid is
> wrong, this doesn't solve the problem of determining an
> appropriate gid.
>
> With this change, the entering process can first enter the
> namespace and then safely inspect the namespace's
> properties, e.g. through /proc/self/{uid_map,gid_map},
> assuming that the namespace owner doesn't have access to
> uid 0.
>
> Changed in v2: The caller needs to be capable in the
> namespace into which tcred's uids/gids can be mapped.
>
> Signed-off-by: Jann Horn <[email protected]>
> ---
> kernel/ptrace.c | 33 ++++++++++++++++++++++++++++-----
> 1 file changed, 28 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index b760bae..260a08d 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -20,6 +20,7 @@
> #include <linux/uio.h>
> #include <linux/audit.h>
> #include <linux/pid_namespace.h>
> +#include <linux/user_namespace.h>
> #include <linux/syscalls.h>
> #include <linux/uaccess.h>
> #include <linux/regset.h>
> @@ -207,12 +208,34 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
> return ret;
> }
>
> -static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
> +static bool ptrace_has_cap(const struct cred *tcred, unsigned int mode)
> {
> + struct user_namespace *tns = tcred->user_ns;
> +
> + /* When a root-owned process enters a user namespace created by a
> + * malicious user, the user shouldn't be able to execute code under
> + * uid 0 by attaching to the root-owned process via ptrace.
> + * Therefore, similar to the capable_wrt_inode_uidgid() check,
> + * verify that all the uids and gids of the target process are
> + * mapped into a namespace below the current one in which the caller
> + * is capable.
> + * No fsuid/fsgid check because __ptrace_may_access doesn't do it
> + * either.
> + */
> + while (
> + !kuid_has_mapping(tns, tcred->euid) ||
> + !kuid_has_mapping(tns, tcred->suid) ||
> + !kuid_has_mapping(tns, tcred->uid) ||
> + !kgid_has_mapping(tns, tcred->egid) ||
> + !kgid_has_mapping(tns, tcred->sgid) ||
> + !kgid_has_mapping(tns, tcred->gid)) {
> + tns = tns->parent;

Sorry, i can't quite remember - is there a way for a task in init_user_ns to have
INVALID_UID | INVALID_GID ? I.e. any point in breaking here if tns == &init_user_n?

> + }
> +
> if (mode & PTRACE_MODE_NOAUDIT)
> - return has_ns_capability_noaudit(current, ns, CAP_SYS_PTRACE);
> + return has_ns_capability_noaudit(current, tns, CAP_SYS_PTRACE);
> else
> - return has_ns_capability(current, ns, CAP_SYS_PTRACE);
> + return has_ns_capability(current, tns, CAP_SYS_PTRACE);
> }
>
> /* Returns 0 on success, -errno on denial. */
> @@ -241,7 +264,7 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
> gid_eq(cred->gid, tcred->sgid) &&
> gid_eq(cred->gid, tcred->gid))
> goto ok;
> - if (ptrace_has_cap(tcred->user_ns, mode))
> + if (ptrace_has_cap(tcred, mode))
> goto ok;
> rcu_read_unlock();
> return -EPERM;
> @@ -252,7 +275,7 @@ ok:
> dumpable = get_dumpable(task->mm);
> rcu_read_lock();
> if (dumpable != SUID_DUMP_USER &&
> - !ptrace_has_cap(__task_cred(task)->user_ns, mode)) {
> + !ptrace_has_cap(__task_cred(task), mode)) {
> rcu_read_unlock();
> return -EPERM;
> }
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2015-12-26 21:27:37

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH] ptrace: being capable wrt a process requires mapped uids/gids

On Sat, Dec 26, 2015 at 03:17:29PM -0600, Serge E. Hallyn wrote:
> On Sat, Dec 26, 2015 at 03:52:31AM +0100, Jann Horn wrote:
> > ptrace_has_cap() checks whether the current process should be
> > treated as having a certain capability for ptrace checks
> > against another process. Until now, this was equivalent to
> > has_ns_capability(current, target_ns, CAP_SYS_PTRACE).
> >
> > However, if a root-owned process wants to enter a user
> > namespace for some reason without knowing who owns it and
> > therefore can't change to the namespace owner's uid and gid
> > before entering, as soon as it has entered the namespace,
> > the namespace owner can attach to it via ptrace and thereby
> > gain access to its uid and gid.
> >
> > While it is possible for the entering process to switch to
> > the uid of a claimed namespace owner before entering,
> > causing the attempt to enter to fail if the claimed uid is
> > wrong, this doesn't solve the problem of determining an
> > appropriate gid.
> >
> > With this change, the entering process can first enter the
> > namespace and then safely inspect the namespace's
> > properties, e.g. through /proc/self/{uid_map,gid_map},
> > assuming that the namespace owner doesn't have access to
> > uid 0.
> >
> > Changed in v2: The caller needs to be capable in the
> > namespace into which tcred's uids/gids can be mapped.
> >
> > Signed-off-by: Jann Horn <[email protected]>
> > ---
> > kernel/ptrace.c | 33 ++++++++++++++++++++++++++++-----
> > 1 file changed, 28 insertions(+), 5 deletions(-)
> >
> > diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> > index b760bae..260a08d 100644
> > --- a/kernel/ptrace.c
> > +++ b/kernel/ptrace.c
> > @@ -20,6 +20,7 @@
> > #include <linux/uio.h>
> > #include <linux/audit.h>
> > #include <linux/pid_namespace.h>
> > +#include <linux/user_namespace.h>
> > #include <linux/syscalls.h>
> > #include <linux/uaccess.h>
> > #include <linux/regset.h>
> > @@ -207,12 +208,34 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
> > return ret;
> > }
> >
> > -static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
> > +static bool ptrace_has_cap(const struct cred *tcred, unsigned int mode)
> > {
> > + struct user_namespace *tns = tcred->user_ns;
> > +
> > + /* When a root-owned process enters a user namespace created by a
> > + * malicious user, the user shouldn't be able to execute code under
> > + * uid 0 by attaching to the root-owned process via ptrace.
> > + * Therefore, similar to the capable_wrt_inode_uidgid() check,
> > + * verify that all the uids and gids of the target process are
> > + * mapped into a namespace below the current one in which the caller
> > + * is capable.
> > + * No fsuid/fsgid check because __ptrace_may_access doesn't do it
> > + * either.
> > + */
> > + while (
> > + !kuid_has_mapping(tns, tcred->euid) ||
> > + !kuid_has_mapping(tns, tcred->suid) ||
> > + !kuid_has_mapping(tns, tcred->uid) ||
> > + !kgid_has_mapping(tns, tcred->egid) ||
> > + !kgid_has_mapping(tns, tcred->sgid) ||
> > + !kgid_has_mapping(tns, tcred->gid)) {
> > + tns = tns->parent;
>
> Sorry, i can't quite remember - is there a way for a task in init_user_ns to have
> INVALID_UID | INVALID_GID ? I.e. any point in breaking here if tns == &init_user_n?

I assumed that there isn't because the comment above the definition of from_kuid()
says so. Checking... the syscalls for setting uid/gid seem to enforce that uid/gid
aren't -1, and setuid/setgid executables require the uid/gid to be mapped. So it
seems to be true.


> > + }
> > +
> > if (mode & PTRACE_MODE_NOAUDIT)
> > - return has_ns_capability_noaudit(current, ns, CAP_SYS_PTRACE);
> > + return has_ns_capability_noaudit(current, tns, CAP_SYS_PTRACE);
> > else
> > - return has_ns_capability(current, ns, CAP_SYS_PTRACE);
> > + return has_ns_capability(current, tns, CAP_SYS_PTRACE);
> > }
> >
> > /* Returns 0 on success, -errno on denial. */
> > @@ -241,7 +264,7 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
> > gid_eq(cred->gid, tcred->sgid) &&
> > gid_eq(cred->gid, tcred->gid))
> > goto ok;
> > - if (ptrace_has_cap(tcred->user_ns, mode))
> > + if (ptrace_has_cap(tcred, mode))
> > goto ok;
> > rcu_read_unlock();
> > return -EPERM;
> > @@ -252,7 +275,7 @@ ok:
> > dumpable = get_dumpable(task->mm);
> > rcu_read_lock();
> > if (dumpable != SUID_DUMP_USER &&
> > - !ptrace_has_cap(__task_cred(task)->user_ns, mode)) {
> > + !ptrace_has_cap(__task_cred(task), mode)) {
> > rcu_read_unlock();
> > return -EPERM;
> > }
> > --
> > 2.1.4
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/


Attachments:
(No filename) (4.83 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments

2015-12-26 21:50:00

by Serge Hallyn

[permalink] [raw]
Subject: Re: [PATCH] ptrace: being capable wrt a process requires mapped uids/gids

On Sat, Dec 26, 2015 at 10:27:33PM +0100, Jann Horn wrote:
> On Sat, Dec 26, 2015 at 03:17:29PM -0600, Serge E. Hallyn wrote:
> > On Sat, Dec 26, 2015 at 03:52:31AM +0100, Jann Horn wrote:
> > > ptrace_has_cap() checks whether the current process should be
> > > treated as having a certain capability for ptrace checks
> > > against another process. Until now, this was equivalent to
> > > has_ns_capability(current, target_ns, CAP_SYS_PTRACE).
> > >
> > > However, if a root-owned process wants to enter a user
> > > namespace for some reason without knowing who owns it and
> > > therefore can't change to the namespace owner's uid and gid
> > > before entering, as soon as it has entered the namespace,
> > > the namespace owner can attach to it via ptrace and thereby
> > > gain access to its uid and gid.
> > >
> > > While it is possible for the entering process to switch to
> > > the uid of a claimed namespace owner before entering,
> > > causing the attempt to enter to fail if the claimed uid is
> > > wrong, this doesn't solve the problem of determining an
> > > appropriate gid.
> > >
> > > With this change, the entering process can first enter the
> > > namespace and then safely inspect the namespace's
> > > properties, e.g. through /proc/self/{uid_map,gid_map},
> > > assuming that the namespace owner doesn't have access to
> > > uid 0.
> > >
> > > Changed in v2: The caller needs to be capable in the
> > > namespace into which tcred's uids/gids can be mapped.
> > >
> > > Signed-off-by: Jann Horn <[email protected]>
> > > ---
> > > kernel/ptrace.c | 33 ++++++++++++++++++++++++++++-----
> > > 1 file changed, 28 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> > > index b760bae..260a08d 100644
> > > --- a/kernel/ptrace.c
> > > +++ b/kernel/ptrace.c
> > > @@ -20,6 +20,7 @@
> > > #include <linux/uio.h>
> > > #include <linux/audit.h>
> > > #include <linux/pid_namespace.h>
> > > +#include <linux/user_namespace.h>
> > > #include <linux/syscalls.h>
> > > #include <linux/uaccess.h>
> > > #include <linux/regset.h>
> > > @@ -207,12 +208,34 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
> > > return ret;
> > > }
> > >
> > > -static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
> > > +static bool ptrace_has_cap(const struct cred *tcred, unsigned int mode)
> > > {
> > > + struct user_namespace *tns = tcred->user_ns;
> > > +
> > > + /* When a root-owned process enters a user namespace created by a
> > > + * malicious user, the user shouldn't be able to execute code under
> > > + * uid 0 by attaching to the root-owned process via ptrace.
> > > + * Therefore, similar to the capable_wrt_inode_uidgid() check,
> > > + * verify that all the uids and gids of the target process are
> > > + * mapped into a namespace below the current one in which the caller
> > > + * is capable.
> > > + * No fsuid/fsgid check because __ptrace_may_access doesn't do it
> > > + * either.
> > > + */
> > > + while (
> > > + !kuid_has_mapping(tns, tcred->euid) ||
> > > + !kuid_has_mapping(tns, tcred->suid) ||
> > > + !kuid_has_mapping(tns, tcred->uid) ||
> > > + !kgid_has_mapping(tns, tcred->egid) ||
> > > + !kgid_has_mapping(tns, tcred->sgid) ||
> > > + !kgid_has_mapping(tns, tcred->gid)) {
> > > + tns = tns->parent;
> >
> > Sorry, i can't quite remember - is there a way for a task in init_user_ns to have
> > INVALID_UID | INVALID_GID ? I.e. any point in breaking here if tns == &init_user_n?
>
> I assumed that there isn't because the comment above the definition of from_kuid()
> says so. Checking... the syscalls for setting uid/gid seem to enforce that uid/gid
> aren't -1, and setuid/setgid executables require the uid/gid to be mapped. So it
> seems to be true.

Yeah, I knew I'd read it somewhere but couldn't find the comment. Thanks.

2015-12-26 21:51:04

by Serge Hallyn

[permalink] [raw]
Subject: Re: [PATCH] ptrace: being capable wrt a process requires mapped uids/gids

On Sat, Dec 26, 2015 at 03:52:31AM +0100, Jann Horn wrote:
> ptrace_has_cap() checks whether the current process should be
> treated as having a certain capability for ptrace checks
> against another process. Until now, this was equivalent to
> has_ns_capability(current, target_ns, CAP_SYS_PTRACE).
>
> However, if a root-owned process wants to enter a user
> namespace for some reason without knowing who owns it and
> therefore can't change to the namespace owner's uid and gid
> before entering, as soon as it has entered the namespace,
> the namespace owner can attach to it via ptrace and thereby
> gain access to its uid and gid.
>
> While it is possible for the entering process to switch to
> the uid of a claimed namespace owner before entering,
> causing the attempt to enter to fail if the claimed uid is
> wrong, this doesn't solve the problem of determining an
> appropriate gid.
>
> With this change, the entering process can first enter the
> namespace and then safely inspect the namespace's
> properties, e.g. through /proc/self/{uid_map,gid_map},
> assuming that the namespace owner doesn't have access to
> uid 0.
>
> Changed in v2: The caller needs to be capable in the
> namespace into which tcred's uids/gids can be mapped.
>
> Signed-off-by: Jann Horn <[email protected]>

Acked-by: Serge Hallyn <[email protected]>

> ---
> kernel/ptrace.c | 33 ++++++++++++++++++++++++++++-----
> 1 file changed, 28 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index b760bae..260a08d 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -20,6 +20,7 @@
> #include <linux/uio.h>
> #include <linux/audit.h>
> #include <linux/pid_namespace.h>
> +#include <linux/user_namespace.h>
> #include <linux/syscalls.h>
> #include <linux/uaccess.h>
> #include <linux/regset.h>
> @@ -207,12 +208,34 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
> return ret;
> }
>
> -static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
> +static bool ptrace_has_cap(const struct cred *tcred, unsigned int mode)
> {
> + struct user_namespace *tns = tcred->user_ns;
> +
> + /* When a root-owned process enters a user namespace created by a
> + * malicious user, the user shouldn't be able to execute code under
> + * uid 0 by attaching to the root-owned process via ptrace.
> + * Therefore, similar to the capable_wrt_inode_uidgid() check,
> + * verify that all the uids and gids of the target process are
> + * mapped into a namespace below the current one in which the caller
> + * is capable.
> + * No fsuid/fsgid check because __ptrace_may_access doesn't do it
> + * either.
> + */
> + while (
> + !kuid_has_mapping(tns, tcred->euid) ||
> + !kuid_has_mapping(tns, tcred->suid) ||
> + !kuid_has_mapping(tns, tcred->uid) ||
> + !kgid_has_mapping(tns, tcred->egid) ||
> + !kgid_has_mapping(tns, tcred->sgid) ||
> + !kgid_has_mapping(tns, tcred->gid)) {
> + tns = tns->parent;
> + }
> +
> if (mode & PTRACE_MODE_NOAUDIT)
> - return has_ns_capability_noaudit(current, ns, CAP_SYS_PTRACE);
> + return has_ns_capability_noaudit(current, tns, CAP_SYS_PTRACE);
> else
> - return has_ns_capability(current, ns, CAP_SYS_PTRACE);
> + return has_ns_capability(current, tns, CAP_SYS_PTRACE);
> }
>
> /* Returns 0 on success, -errno on denial. */
> @@ -241,7 +264,7 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
> gid_eq(cred->gid, tcred->sgid) &&
> gid_eq(cred->gid, tcred->gid))
> goto ok;
> - if (ptrace_has_cap(tcred->user_ns, mode))
> + if (ptrace_has_cap(tcred, mode))
> goto ok;
> rcu_read_unlock();
> return -EPERM;
> @@ -252,7 +275,7 @@ ok:
> dumpable = get_dumpable(task->mm);
> rcu_read_lock();
> if (dumpable != SUID_DUMP_USER &&
> - !ptrace_has_cap(__task_cred(task)->user_ns, mode)) {
> + !ptrace_has_cap(__task_cred(task), mode)) {
> rcu_read_unlock();
> return -EPERM;
> }
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2015-12-27 02:03:22

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] ptrace: being capable wrt a process requires mapped uids/gids

On Sat, Dec 26, 2015 at 1:51 PM, Serge E. Hallyn
<[email protected]> wrote:
> On Sat, Dec 26, 2015 at 03:52:31AM +0100, Jann Horn wrote:
>> ptrace_has_cap() checks whether the current process should be
>> treated as having a certain capability for ptrace checks
>> against another process. Until now, this was equivalent to
>> has_ns_capability(current, target_ns, CAP_SYS_PTRACE).
>>
>> However, if a root-owned process wants to enter a user
>> namespace for some reason without knowing who owns it and
>> therefore can't change to the namespace owner's uid and gid
>> before entering, as soon as it has entered the namespace,
>> the namespace owner can attach to it via ptrace and thereby
>> gain access to its uid and gid.
>>
>> While it is possible for the entering process to switch to
>> the uid of a claimed namespace owner before entering,
>> causing the attempt to enter to fail if the claimed uid is
>> wrong, this doesn't solve the problem of determining an
>> appropriate gid.
>>
>> With this change, the entering process can first enter the
>> namespace and then safely inspect the namespace's
>> properties, e.g. through /proc/self/{uid_map,gid_map},
>> assuming that the namespace owner doesn't have access to
>> uid 0.
>>
>> Changed in v2: The caller needs to be capable in the
>> namespace into which tcred's uids/gids can be mapped.
>>
>> Signed-off-by: Jann Horn <[email protected]>
>
> Acked-by: Serge Hallyn <[email protected]>

Acked-by: Andy Lutomirski <[email protected]>

Who's going to apply this? Linus? Eric?

--Andy