2016-10-24 11:00:07

by Cyrill Gorcunov

[permalink] [raw]
Subject: [ISSUE] mm: Add a user_ns owner to mm_struct and fix ptrace_may_access

Hi Eric! A few days ago we've noticed that our zombie00 test case started
failing: https://ci.openvz.org/job/CRIU/view/All/job/CRIU-linux-next/406/console
---
======================== Run zdtm/static/zombie00 in h =========================
Start test
./zombie00 --pidfile=zombie00.pid --outfile=zombie00.out
Run criu dump
Run criu restore
Send the 15 signal to 30
Wait for zdtm/static/zombie00(30) to die for 0.100000
################ Test zdtm/static/zombie00 FAIL at result check ################

I've narrowed problem down to commit

| From ce99dd5fd5f600f9f4f0d37bb8847c1cb7c6e4fc Mon Sep 17 00:00:00 2001
| From: "Eric W. Biederman" <[email protected]>
| Date: Thu, 13 Oct 2016 21:23:16 -0500
| Subject: [PATCH] mm: Add a user_ns owner to mm_struct and fix
| ptrace_may_access
|
| During exec dumpable is cleared if the file that is being executed is
| not readable by the user executing the file. A bug in
| ptrace_may_access allows reading the file if the executable happens to
| enter into a subordinate user namespace (aka clone(CLONE_NEWUSER),
| unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER).

and the reason is that the zombie tasks do not have task::mm and in resut
we're obtaining -EPERM when trying to read task->exit_code from /proc/pid/stat.

Looking into commit I suspect when mm = NULL we've to move back the test
ptrace_has_cap(__task_cred(task)->user_ns, mode)?


2016-10-24 19:01:00

by Andrei Vagin

[permalink] [raw]
Subject: Re: [ISSUE] mm: Add a user_ns owner to mm_struct and fix ptrace_may_access

On Mon, Oct 24, 2016 at 01:59:59PM +0300, Cyrill Gorcunov wrote:
> Hi Eric! A few days ago we've noticed that our zombie00 test case started
> failing: https://ci.openvz.org/job/CRIU/view/All/job/CRIU-linux-next/406/console
> ---
> ======================== Run zdtm/static/zombie00 in h =========================
> Start test
> ./zombie00 --pidfile=zombie00.pid --outfile=zombie00.out
> Run criu dump
> Run criu restore
> Send the 15 signal to 30
> Wait for zdtm/static/zombie00(30) to die for 0.100000
> ################ Test zdtm/static/zombie00 FAIL at result check ################
>
> I've narrowed problem down to commit
>
> | From ce99dd5fd5f600f9f4f0d37bb8847c1cb7c6e4fc Mon Sep 17 00:00:00 2001
> | From: "Eric W. Biederman" <[email protected]>
> | Date: Thu, 13 Oct 2016 21:23:16 -0500
> | Subject: [PATCH] mm: Add a user_ns owner to mm_struct and fix
> | ptrace_may_access
> |
> | During exec dumpable is cleared if the file that is being executed is
> | not readable by the user executing the file. A bug in
> | ptrace_may_access allows reading the file if the executable happens to
> | enter into a subordinate user namespace (aka clone(CLONE_NEWUSER),
> | unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER).
>
> and the reason is that the zombie tasks do not have task::mm and in resut
> we're obtaining -EPERM when trying to read task->exit_code from /proc/pid/stat.

To be precise,
we are obtaining 0 instead of task->exit_ode, because permitted is
false:

static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,

...

permitted = ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS |
PTRACE_MODE_NOAUDIT);

...

if (permitted)
seq_put_decimal_ll(m, " ", task->exit_code);
else
seq_puts(m, " 0");

>
> Looking into commit I suspect when mm = NULL we've to move back the test
> ptrace_has_cap(__task_cred(task)->user_ns, mode)?

2016-10-24 19:03:51

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [ISSUE] mm: Add a user_ns owner to mm_struct and fix ptrace_may_access


Adding the containers list because that is the general place for these
kinds of discussions.

Cyrill Gorcunov <[email protected]> writes:

> Hi Eric! A few days ago we've noticed that our zombie00 test case started
> failing: https://ci.openvz.org/job/CRIU/view/All/job/CRIU-linux-next/406/console

> ---
> ======================== Run zdtm/static/zombie00 in h =========================
> Start test
> ./zombie00 --pidfile=zombie00.pid --outfile=zombie00.out
> Run criu dump
> Run criu restore
> Send the 15 signal to 30
> Wait for zdtm/static/zombie00(30) to die for 0.100000
> ################ Test zdtm/static/zombie00 FAIL at result check ################
>
> I've narrowed problem down to commit
>
> | From ce99dd5fd5f600f9f4f0d37bb8847c1cb7c6e4fc Mon Sep 17 00:00:00 2001
> | From: "Eric W. Biederman" <[email protected]>
> | Date: Thu, 13 Oct 2016 21:23:16 -0500
> | Subject: [PATCH] mm: Add a user_ns owner to mm_struct and fix
> | ptrace_may_access
> |
> | During exec dumpable is cleared if the file that is being executed is
> | not readable by the user executing the file. A bug in
> | ptrace_may_access allows reading the file if the executable happens to
> | enter into a subordinate user namespace (aka clone(CLONE_NEWUSER),
> | unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER).
>
> and the reason is that the zombie tasks do not have task::mm and in resut
> we're obtaining -EPERM when trying to read task->exit_code from
> /proc/pid/stat.

Hmm. As I read the code exit_code should be returned to userspace as a
0. It does not look to me as if userspace should see an error in
that case.

> Looking into commit I suspect when mm = NULL we've to move back the test
> ptrace_has_cap(__task_cred(task)->user_ns, mode)?

Maybe.

We might want to consider if these lines from do_task_stat make
any sense.

if (permitted)
seq_put_decimal_ll(m, " ", task->exit_code);
else
seq_puts(m, " 0");

Looking at the code. Nothing changes behavior except for privileged
tasks looking at processes without an mm. So yes it may be sane
to tweak that part of the check.

AKA in the in for-next branch the code currenty says:
mm = task->mm;
if (!mm ||
((get_dumpable(mm) != SUID_DUMP_USER) &&
!ptrace_has_cap(mm->user_ns, mode)))
return -EPERM;

And in the case there is no mm there is no way to get
past returning -EPERM.

Looking at why we use ptrace_may_access in the exit_code case
I see a couple of relevant commits.

The commit that added the exit code check:

commit f83ce3e6b02d5e48b3a43b001390e2b58820389d
Author: Jake Edge <[email protected]>
Date: Mon May 4 12:51:14 2009 -0600

proc: avoid information leaks to non-privileged processes

By using the same test as is used for /proc/pid/maps and /proc/pid/smaps,
only allow processes that can ptrace() a given process to see information
that might be used to bypass address space layout randomization (ASLR).
These include eip, esp, wchan, and start_stack in /proc/pid/stat as well
as the non-symbolic output from /proc/pid/wchan.

ASLR can be bypassed by sampling eip as shown by the proof-of-concept
code at http://code.google.com/p/fuzzyaslr/ As part of a presentation
(http://www.cr0.org/paper/to-jt-linux-alsr-leak.pdf) esp and wchan were
also noted as possibly usable information leaks as well. The
start_stack address also leaks potentially useful information.

Cc: Stable Team <[email protected]>
Signed-off-by: Jake Edge <[email protected]>
Acked-by: Arjan van de Ven <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>


The change that started protecting start_code/end_code and
generally using these permissions to protect this class of information:

commit 5883f57ca0008ffc93e09cbb9847a1928e50c6f3
Author: Kees Cook <[email protected]>
Date: Wed Mar 23 16:42:53 2011 -0700

proc: protect mm start_code/end_code in /proc/pid/stat

While mm->start_stack was protected from cross-uid viewing (commit
f83ce3e6b02d5 ("proc: avoid information leaks to non-privileged
processes")), the start_code and end_code values were not. This would
allow the text location of a PIE binary to leak, defeating ASLR.

Note that the value "1" is used instead of "0" for a protected value since
"ps", "killall", and likely other readers of /proc/pid/stat, take
start_code of "0" to mean a kernel thread and will misbehave. Thanks to
Brad Spengler for pointing this out.

Addresses CVE-2011-0726

Signed-off-by: Kees Cook <[email protected]>
Cc: <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: David Howells <[email protected]>
Cc: Eugene Teo <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Brad Spengler <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>


The commit that added task->exit_code:

commit 5b172087f99189416d5f47fd7ab5e6fb762a9ba3
Author: Cyrill Gorcunov <[email protected]>
Date: Thu May 31 16:26:44 2012 -0700

c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/stat

We would like to have an ability to restore command line arguments and
program environment pointers but first we need to obtain them somehow.
Thus we put these values into /proc/$pid/stat. The exit_code is needed to
restore zombie tasks.

Signed-off-by: Cyrill Gorcunov <[email protected]>
Acked-by: Kees Cook <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: Serge Hallyn <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Vagin <[email protected]>
Cc: Vasiliy Kulikov <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>


Looking at do_task_stat everything else that requires permitted
in do_tack_stat is an address. exit_code is something else so
I am not at all certain the ptrace_may_access permission check
makes sense.

A process without an mm is fundamentally undumpable so an error should
be returned in any case. So I don't see any harm in failing
ptrace_may_access in general. At the same time I can see how not
preserving the existing behavior is problematic.

So I am probably going to tweak the !mm case so that instead of failing
we perform the old capable check in that case. That seems the mot
certain way to avoid regressions. With that said, why is exit_code
behind a ptrace_may_access permission check?

Eric

2016-10-24 20:29:31

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [ISSUE] mm: Add a user_ns owner to mm_struct and fix ptrace_may_access

On Mon, Oct 24, 2016 at 02:01:30PM -0500, Eric W. Biederman wrote:
>
> Adding the containers list because that is the general place for these
> kinds of discussions.

Thanks!

> Cyrill Gorcunov <[email protected]> writes:
>
> > Hi Eric! A few days ago we've noticed that our zombie00 test case started
> > failing: https://ci.openvz.org/job/CRIU/view/All/job/CRIU-linux-next/406/console
>
> > ---
> > ======================== Run zdtm/static/zombie00 in h =========================
> > Start test
> > ./zombie00 --pidfile=zombie00.pid --outfile=zombie00.out
> > Run criu dump
> > Run criu restore
> > Send the 15 signal to 30
> > Wait for zdtm/static/zombie00(30) to die for 0.100000
> > ################ Test zdtm/static/zombie00 FAIL at result check ################
> >
> > I've narrowed problem down to commit
> >
> > | From ce99dd5fd5f600f9f4f0d37bb8847c1cb7c6e4fc Mon Sep 17 00:00:00 2001
> > | From: "Eric W. Biederman" <[email protected]>
> > | Date: Thu, 13 Oct 2016 21:23:16 -0500
> > | Subject: [PATCH] mm: Add a user_ns owner to mm_struct and fix
> > | ptrace_may_access
> > |
> > | During exec dumpable is cleared if the file that is being executed is
> > | not readable by the user executing the file. A bug in
> > | ptrace_may_access allows reading the file if the executable happens to
> > | enter into a subordinate user namespace (aka clone(CLONE_NEWUSER),
> > | unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER).
> >
> > and the reason is that the zombie tasks do not have task::mm and in resut
> > we're obtaining -EPERM when trying to read task->exit_code from
> > /proc/pid/stat.
>
> Hmm. As I read the code exit_code should be returned to userspace as a
> 0. It does not look to me as if userspace should see an error in
> that case.

I mean the ptrace-check returns -EPERM and we don't see @exit_code.
Sorry for confusion.

>
> > Looking into commit I suspect when mm = NULL we've to move back the test
> > ptrace_has_cap(__task_cred(task)->user_ns, mode)?
>
> Maybe.
>
> We might want to consider if these lines from do_task_stat make
> any sense.
>
> if (permitted)
> seq_put_decimal_ll(m, " ", task->exit_code);
> else
> seq_puts(m, " 0");
>
> Looking at the code. Nothing changes behavior except for privileged
> tasks looking at processes without an mm. So yes it may be sane
> to tweak that part of the check.

I think so, otherwise we might break api.

> AKA in the in for-next branch the code currenty says:
> mm = task->mm;
> if (!mm ||
> ((get_dumpable(mm) != SUID_DUMP_USER) &&
> !ptrace_has_cap(mm->user_ns, mode)))
> return -EPERM;
>
> And in the case there is no mm there is no way to get
> past returning -EPERM.
>
> Looking at why we use ptrace_may_access in the exit_code case
> I see a couple of relevant commits.
...
>
> The commit that added task->exit_code:
>
> commit 5b172087f99189416d5f47fd7ab5e6fb762a9ba3
> Author: Cyrill Gorcunov <[email protected]>
> Date: Thu May 31 16:26:44 2012 -0700
>
> c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/stat
>
> We would like to have an ability to restore command line arguments and
> program environment pointers but first we need to obtain them somehow.
> Thus we put these values into /proc/$pid/stat. The exit_code is needed to
> restore zombie tasks.
>
> Signed-off-by: Cyrill Gorcunov <[email protected]>
> Acked-by: Kees Cook <[email protected]>
> Cc: Pavel Emelyanov <[email protected]>
> Cc: Serge Hallyn <[email protected]>
> Cc: KAMEZAWA Hiroyuki <[email protected]>
> Cc: Alexey Dobriyan <[email protected]>
> Cc: Tejun Heo <[email protected]>
> Cc: Andrew Vagin <[email protected]>
> Cc: Vasiliy Kulikov <[email protected]>
> Cc: Alexey Dobriyan <[email protected]>
> Cc: "Eric W. Biederman" <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> Signed-off-by: Linus Torvalds <[email protected]>
>

Yes, I've been adding it for criu sake.

> Looking at do_task_stat everything else that requires permitted
> in do_tack_stat is an address. exit_code is something else so
> I am not at all certain the ptrace_may_access permission check
> makes sense.

Well, I suspect @exit_code may be suitable for attacker to find
out if some address accessed cause sigsevg or something like that.

>
> A process without an mm is fundamentally undumpable so an error should
> be returned in any case. So I don't see any harm in failing
> ptrace_may_access in general. At the same time I can see how not
> preserving the existing behavior is problematic.
>
> So I am probably going to tweak the !mm case so that instead of failing
> we perform the old capable check in that case. That seems the mot
> certain way to avoid regressions. With that said, why is exit_code
> behind a ptrace_may_access permission check?

Yes, this would be great! And as to @exit_code I think better ask
Kees, CC'ed.

Cyrill

2016-10-24 21:32:12

by Kees Cook

[permalink] [raw]
Subject: Re: [ISSUE] mm: Add a user_ns owner to mm_struct and fix ptrace_may_access

On Mon, Oct 24, 2016 at 1:29 PM, Cyrill Gorcunov <[email protected]> wrote:
> On Mon, Oct 24, 2016 at 02:01:30PM -0500, Eric W. Biederman wrote:
>> So I am probably going to tweak the !mm case so that instead of failing
>> we perform the old capable check in that case. That seems the mot
>> certain way to avoid regressions. With that said, why is exit_code
>> behind a ptrace_may_access permission check?
>
> Yes, this would be great! And as to @exit_code I think better ask
> Kees, CC'ed.

My concern was that this was an exposure in the sense that it is
internal program state that isn't visible through other means (without
being the parent, for example). Under the ptrace check, it has an
equivalency that seemed correct at the time.

As already covered, I'd agree: it looks like ce99dd5fd5f6 accidentally
added a dependency on task->mm where it didn't before. That section of
logic was entirely around dumpability, not an mm existing. It should
be "EPERM if mm and dumpable != SUID_DUMP_USER".

That said, I'd also agree that ptrace against no mm is crazy (though I
suppose that should return EINVAL or ESRCH or something), so perhaps a
better access control on @exit_code is needed here.

-Kees

--
Kees Cook
Nexus Security

2016-10-24 23:13:24

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [ISSUE] mm: Add a user_ns owner to mm_struct and fix ptrace_may_access

Kees Cook <[email protected]> writes:

> On Mon, Oct 24, 2016 at 1:29 PM, Cyrill Gorcunov <[email protected]> wrote:
>> On Mon, Oct 24, 2016 at 02:01:30PM -0500, Eric W. Biederman wrote:
>>> So I am probably going to tweak the !mm case so that instead of failing
>>> we perform the old capable check in that case. That seems the mot
>>> certain way to avoid regressions. With that said, why is exit_code
>>> behind a ptrace_may_access permission check?
>>
>> Yes, this would be great! And as to @exit_code I think better ask
>> Kees, CC'ed.
>
> My concern was that this was an exposure in the sense that it is
> internal program state that isn't visible through other means (without
> being the parent, for example). Under the ptrace check, it has an
> equivalency that seemed correct at the time.
>
> As already covered, I'd agree: it looks like ce99dd5fd5f6 accidentally
> added a dependency on task->mm where it didn't before. That section of
> logic was entirely around dumpability, not an mm existing. It should
> be "EPERM if mm and dumpable != SUID_DUMP_USER".
>
> That said, I'd also agree that ptrace against no mm is crazy (though I
> suppose that should return EINVAL or ESRCH or something), so perhaps a
> better access control on @exit_code is needed here.

I traced down the original logic of why we had that dumpable
variable, and it was ancient conservative on my part when we started
using the ptrace permission checks for proc files.

That same conservatism has resulted in the regression under
discussion.

Given that we already have a very full set of permission checks
separate from dumpable in ptrace_may_access I think it makes sense
to simply ignore dumpable when there is no mm.
AKA:
mm = task->mm;
if (mm &&
((get_dumpable(mm) != SUID_DUMP_USER) &&
!ptrace_has_cap(mm->user_ns, mode)))
return -EPERM;

Because while it has been used for other things dumpable is
fundamentally about do you have permission to read the mm.
So there is no real point in permission checks that protect
the mm if the mm has gone away.

It also looks like I may need to update the check that sets
PT_PTRACE_CAP to look at mm->user_ns as well.

Eric




2016-10-25 09:02:32

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [ISSUE] mm: Add a user_ns owner to mm_struct and fix ptrace_may_access

On Mon, Oct 24, 2016 at 06:11:04PM -0500, Eric W. Biederman wrote:
> Kees Cook <[email protected]> writes:
>
> > On Mon, Oct 24, 2016 at 1:29 PM, Cyrill Gorcunov <[email protected]> wrote:
> >> On Mon, Oct 24, 2016 at 02:01:30PM -0500, Eric W. Biederman wrote:
> >>> So I am probably going to tweak the !mm case so that instead of failing
> >>> we perform the old capable check in that case. That seems the mot
> >>> certain way to avoid regressions. With that said, why is exit_code
> >>> behind a ptrace_may_access permission check?
> >>
> >> Yes, this would be great! And as to @exit_code I think better ask
> >> Kees, CC'ed.
> >
> > My concern was that this was an exposure in the sense that it is
> > internal program state that isn't visible through other means (without
> > being the parent, for example). Under the ptrace check, it has an
> > equivalency that seemed correct at the time.
> >
> > As already covered, I'd agree: it looks like ce99dd5fd5f6 accidentally
> > added a dependency on task->mm where it didn't before. That section of
> > logic was entirely around dumpability, not an mm existing. It should
> > be "EPERM if mm and dumpable != SUID_DUMP_USER".
> >
> > That said, I'd also agree that ptrace against no mm is crazy (though I
> > suppose that should return EINVAL or ESRCH or something), so perhaps a
> > better access control on @exit_code is needed here.
>
> I traced down the original logic of why we had that dumpable
> variable, and it was ancient conservative on my part when we started
> using the ptrace permission checks for proc files.
>
> That same conservatism has resulted in the regression under
> discussion.
>
> Given that we already have a very full set of permission checks
> separate from dumpable in ptrace_may_access I think it makes sense
> to simply ignore dumpable when there is no mm.
> AKA:
> mm = task->mm;
> if (mm &&
> ((get_dumpable(mm) != SUID_DUMP_USER) &&
> !ptrace_has_cap(mm->user_ns, mode)))
> return -EPERM;
>
> Because while it has been used for other things dumpable is
> fundamentally about do you have permission to read the mm.
> So there is no real point in permission checks that protect
> the mm if the mm has gone away.
>
> It also looks like I may need to update the check that sets
> PT_PTRACE_CAP to look at mm->user_ns as well.

Thanks a lot for informative explanations, Eric and Kees!
Eric, if you make some patch please ping me to test it.

2016-10-27 15:57:07

by Eric W. Biederman

[permalink] [raw]
Subject: [REVIEW][PATCH v2] mm: Add a user_ns owner to mm_struct and fix ptrace permission checks


During exec dumpable is cleared if the file that is being executed is
not readable by the user executing the file. A bug in
ptrace_may_access allows reading the file if the executable happens to
enter into a subordinate user namespace (aka clone(CLONE_NEWUSER),
unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER).

This problem is fixed with only necessary userspace breakage by adding
a user namespace owner to mm_struct, captured at the time of exec, so
it is clear in which user namespace CAP_SYS_PTRACE must be present in
to be able to safely give read permission to the executable.

The function ptrace_may_access is modified to verify that the ptracer
has CAP_SYS_ADMIN in task->mm->user_ns instead of task->cred->user_ns.
This ensures that if the task changes it's cred into a subordinate
user namespace it does not become ptraceable.

The function ptrace_attach is modified to only set PT_PTRACE_CAP when
CAP_SYS_PTRACE is held over task->mm->user_ns. The intent of
PT_PTRACE_CAP is to be a flag to note that whatever permission changes
the task might go through the tracer has sufficient permissions for
it not to be an issue. task->cred->user_ns is always the same
as or descendent of mm->user_ns. Which guarantees that having
CAP_SYS_PTRACE over mm->user_ns is the worst case for the tasks
credentials.

Cc: [email protected]
Fixes: 8409cca70561 ("userns: allow ptrace from non-init user namespaces")
Signed-off-by: "Eric W. Biederman" <[email protected]>
---

Jann or anyone who looked at my last version of this that I thought
was ready to go, I have change the test in ptrace_may_access from:
if (!mm ||
((get_dumpable(mm) != SUID_DUMP_USER) &&
!ptrace_has_cap(mm->user_ns, mode)))
return -EPERM;
to:
if (mm &&
((get_dumpable(mm) != SUID_DUMP_USER) &&
!ptrace_has_cap(mm->user_ns, mode)))
return -EPERM;

As enforcing the dumpablity check without an mm caused a regression for
Cyrill (he could not read task->exit code of a zomebie even with a full
set of caps).

Further I have moved the setting of PT_PTRACE_CAP up in ptrace_attach
so that I can easily use the mm->user_ns.

I can't imagine either of these changes making a practical difference
to anyone but I am calling them out in case someone can.

include/linux/mm_types.h | 1 +
kernel/fork.c | 9 ++++++---
kernel/ptrace.c | 26 +++++++++++---------------
mm/init-mm.c | 2 ++
4 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 4a8acedf4b7d..08d947fc4c59 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -473,6 +473,7 @@ struct mm_struct {
*/
struct task_struct __rcu *owner;
#endif
+ struct user_namespace *user_ns;

/* store ref to file /proc/<pid>/exe symlink points to */
struct file __rcu *exe_file;
diff --git a/kernel/fork.c b/kernel/fork.c
index 623259fc794d..fd85c68c2791 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -742,7 +742,8 @@ static void mm_init_owner(struct mm_struct *mm, struct task_struct *p)
#endif
}

-static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p)
+static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
+ struct user_namespace *user_ns)
{
mm->mmap = NULL;
mm->mm_rb = RB_ROOT;
@@ -782,6 +783,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p)
if (init_new_context(p, mm))
goto fail_nocontext;

+ mm->user_ns = get_user_ns(user_ns);
return mm;

fail_nocontext:
@@ -827,7 +829,7 @@ struct mm_struct *mm_alloc(void)
return NULL;

memset(mm, 0, sizeof(*mm));
- return mm_init(mm, current);
+ return mm_init(mm, current, current_user_ns());
}

/*
@@ -842,6 +844,7 @@ void __mmdrop(struct mm_struct *mm)
destroy_context(mm);
mmu_notifier_mm_destroy(mm);
check_mm(mm);
+ put_user_ns(mm->user_ns);
free_mm(mm);
}
EXPORT_SYMBOL_GPL(__mmdrop);
@@ -1123,7 +1126,7 @@ static struct mm_struct *dup_mm(struct task_struct *tsk)

memcpy(mm, oldmm, sizeof(*mm));

- if (!mm_init(mm, tsk))
+ if (!mm_init(mm, tsk, mm->user_ns))
goto fail_nomem;

err = dup_mmap(mm, oldmm);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 2a99027312a6..44a25a1e6e83 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -220,7 +220,7 @@ static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
{
const struct cred *cred = current_cred(), *tcred;
- int dumpable = 0;
+ struct mm_struct *mm;
kuid_t caller_uid;
kgid_t caller_gid;

@@ -271,16 +271,11 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
return -EPERM;
ok:
rcu_read_unlock();
- smp_rmb();
- if (task->mm)
- dumpable = get_dumpable(task->mm);
- rcu_read_lock();
- if (dumpable != SUID_DUMP_USER &&
- !ptrace_has_cap(__task_cred(task)->user_ns, mode)) {
- rcu_read_unlock();
- return -EPERM;
- }
- rcu_read_unlock();
+ mm = task->mm;
+ if (mm &&
+ ((get_dumpable(mm) != SUID_DUMP_USER) &&
+ !ptrace_has_cap(mm->user_ns, mode)))
+ return -EPERM;

return security_ptrace_access_check(task, mode);
}
@@ -331,6 +326,11 @@ static int ptrace_attach(struct task_struct *task, long request,

task_lock(task);
retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS);
+ if (!retval) {
+ struct mm_struct *mm = task->mm;
+ if (mm && ns_capable(mm->user_ns, CAP_SYS_PTRACE))
+ flags |= PT_PTRACE_CAP;
+ }
task_unlock(task);
if (retval)
goto unlock_creds;
@@ -344,10 +344,6 @@ static int ptrace_attach(struct task_struct *task, long request,

if (seize)
flags |= PT_SEIZED;
- rcu_read_lock();
- if (ns_capable(__task_cred(task)->user_ns, CAP_SYS_PTRACE))
- flags |= PT_PTRACE_CAP;
- rcu_read_unlock();
task->ptrace = flags;

__ptrace_link(task, current);
diff --git a/mm/init-mm.c b/mm/init-mm.c
index a56a851908d2..975e49f00f34 100644
--- a/mm/init-mm.c
+++ b/mm/init-mm.c
@@ -6,6 +6,7 @@
#include <linux/cpumask.h>

#include <linux/atomic.h>
+#include <linux/user_namespace.h>
#include <asm/pgtable.h>
#include <asm/mmu.h>

@@ -21,5 +22,6 @@ struct mm_struct init_mm = {
.mmap_sem = __RWSEM_INITIALIZER(init_mm.mmap_sem),
.page_table_lock = __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock),
.mmlist = LIST_HEAD_INIT(init_mm.mmlist),
+ .user_ns = &init_user_ns,
INIT_MM_CONTEXT(init_mm)
};
--
2.10.1

2016-10-27 21:27:06

by Kees Cook

[permalink] [raw]
Subject: Re: [REVIEW][PATCH v2] mm: Add a user_ns owner to mm_struct and fix ptrace permission checks

On Thu, Oct 27, 2016 at 8:54 AM, Eric W. Biederman
<[email protected]> wrote:
>
> During exec dumpable is cleared if the file that is being executed is
> not readable by the user executing the file. A bug in
> ptrace_may_access allows reading the file if the executable happens to
> enter into a subordinate user namespace (aka clone(CLONE_NEWUSER),
> unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER).
>
> This problem is fixed with only necessary userspace breakage by adding
> a user namespace owner to mm_struct, captured at the time of exec, so
> it is clear in which user namespace CAP_SYS_PTRACE must be present in
> to be able to safely give read permission to the executable.
>
> The function ptrace_may_access is modified to verify that the ptracer
> has CAP_SYS_ADMIN in task->mm->user_ns instead of task->cred->user_ns.
> This ensures that if the task changes it's cred into a subordinate
> user namespace it does not become ptraceable.
>
> The function ptrace_attach is modified to only set PT_PTRACE_CAP when
> CAP_SYS_PTRACE is held over task->mm->user_ns. The intent of
> PT_PTRACE_CAP is to be a flag to note that whatever permission changes
> the task might go through the tracer has sufficient permissions for
> it not to be an issue. task->cred->user_ns is always the same
> as or descendent of mm->user_ns. Which guarantees that having
> CAP_SYS_PTRACE over mm->user_ns is the worst case for the tasks
> credentials.
>
> Cc: [email protected]
> Fixes: 8409cca70561 ("userns: allow ptrace from non-init user namespaces")
> Signed-off-by: "Eric W. Biederman" <[email protected]>

Thanks!

Acked-by: Kees Cook <[email protected]>

-Kees

--
Kees Cook
Nexus Security

2016-10-27 21:39:23

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [REVIEW][PATCH v2] mm: Add a user_ns owner to mm_struct and fix ptrace permission checks

On Thu, Oct 27, 2016 at 10:54:34AM -0500, Eric W. Biederman wrote:
>
>
> I can't imagine either of these changes making a practical difference
> to anyone but I am calling them out in case someone can.
>
> include/linux/mm_types.h | 1 +
> kernel/fork.c | 9 ++++++---
> kernel/ptrace.c | 26 +++++++++++---------------
> mm/init-mm.c | 2 ++
> 4 files changed, 20 insertions(+), 18 deletions(-)

Thanks a huge, Eric! And really sorry for delay in response,
I managed to miss this quite important mail for me in mail
storm. Gonna test it and will write you the results. Overall looks
great, but better be sure and run the tests.

Reviewed-by: Cyrill Gorcunov <[email protected]>

2016-10-27 22:34:35

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [REVIEW][PATCH v2] mm: Add a user_ns owner to mm_struct and fix ptrace permission checks

On Fri, Oct 28, 2016 at 12:39:18AM +0300, Cyrill Gorcunov wrote:
> On Thu, Oct 27, 2016 at 10:54:34AM -0500, Eric W. Biederman wrote:
> >
> >
> > I can't imagine either of these changes making a practical difference
> > to anyone but I am calling them out in case someone can.
> >
> > include/linux/mm_types.h | 1 +
> > kernel/fork.c | 9 ++++++---
> > kernel/ptrace.c | 26 +++++++++++---------------
> > mm/init-mm.c | 2 ++
> > 4 files changed, 20 insertions(+), 18 deletions(-)
>
> Thanks a huge, Eric! And really sorry for delay in response,
> I managed to miss this quite important mail for me in mail
> storm. Gonna test it and will write you the results. Overall looks
> great, but better be sure and run the tests.
>
> Reviewed-by: Cyrill Gorcunov <[email protected]>

Eric, on which kernel the patch is on top of?
It doesn't apply on linux-next for some reason.

| Date: Thu Oct 27 14:21:59 2016 +1100
|
| Add linux-next specific files for 20161027
|
| Signed-off-by: Stephen Rothwell <[email protected]>

I applied it on Linus' master and tests passed fine
(but they were passing fine even without the patch,
only linux-next failed).

2016-10-28 02:24:57

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [REVIEW][PATCH v2] mm: Add a user_ns owner to mm_struct and fix ptrace permission checks

Cyrill Gorcunov <[email protected]> writes:

> On Fri, Oct 28, 2016 at 12:39:18AM +0300, Cyrill Gorcunov wrote:
>> On Thu, Oct 27, 2016 at 10:54:34AM -0500, Eric W. Biederman wrote:
>> >
>> >
>> > I can't imagine either of these changes making a practical difference
>> > to anyone but I am calling them out in case someone can.
>> >
>> > include/linux/mm_types.h | 1 +
>> > kernel/fork.c | 9 ++++++---
>> > kernel/ptrace.c | 26 +++++++++++---------------
>> > mm/init-mm.c | 2 ++
>> > 4 files changed, 20 insertions(+), 18 deletions(-)
>>
>> Thanks a huge, Eric! And really sorry for delay in response,
>> I managed to miss this quite important mail for me in mail
>> storm. Gonna test it and will write you the results. Overall looks
>> great, but better be sure and run the tests.
>>
>> Reviewed-by: Cyrill Gorcunov <[email protected]>
>
> Eric, on which kernel the patch is on top of?
> It doesn't apply on linux-next for some reason.
>
> | Date: Thu Oct 27 14:21:59 2016 +1100
> |
> | Add linux-next specific files for 20161027
> |
> | Signed-off-by: Stephen Rothwell <[email protected]>
>
> I applied it on Linus' master and tests passed fine
> (but they were passing fine even without the patch,
> only linux-next failed).

Odd. I don't think I have taken the old version out of
linux-next yet. So you can probably revert the old version out of
linux-next and apply this one. All of my development at this point is
against v4.9-rc1.

I suspect you will find my last version on top of against v4.9-rc1 will
pass. Since my tree is only one deep and I don't think anyone except
linux-next is based on it, I plan to drop and readd this patch.
Especially since it is candidate for backporting.

Eric

2016-10-28 04:47:57

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [REVIEW][PATCH v2] mm: Add a user_ns owner to mm_struct and fix ptrace permission checks

[email protected] (Eric W. Biederman) writes:

> Cyrill Gorcunov <[email protected]> writes:
>
>> On Fri, Oct 28, 2016 at 12:39:18AM +0300, Cyrill Gorcunov wrote:
>>> On Thu, Oct 27, 2016 at 10:54:34AM -0500, Eric W. Biederman wrote:
>>> >
>>> >
>>> > I can't imagine either of these changes making a practical difference
>>> > to anyone but I am calling them out in case someone can.
>>> >
>>> > include/linux/mm_types.h | 1 +
>>> > kernel/fork.c | 9 ++++++---
>>> > kernel/ptrace.c | 26 +++++++++++---------------
>>> > mm/init-mm.c | 2 ++
>>> > 4 files changed, 20 insertions(+), 18 deletions(-)
>>>
>>> Thanks a huge, Eric! And really sorry for delay in response,
>>> I managed to miss this quite important mail for me in mail
>>> storm. Gonna test it and will write you the results. Overall looks
>>> great, but better be sure and run the tests.
>>>
>>> Reviewed-by: Cyrill Gorcunov <[email protected]>
>>
>> Eric, on which kernel the patch is on top of?
>> It doesn't apply on linux-next for some reason.
>>
>> | Date: Thu Oct 27 14:21:59 2016 +1100
>> |
>> | Add linux-next specific files for 20161027
>> |
>> | Signed-off-by: Stephen Rothwell <[email protected]>
>>
>> I applied it on Linus' master and tests passed fine
>> (but they were passing fine even without the patch,
>> only linux-next failed).
>
> Odd. I don't think I have taken the old version out of
> linux-next yet. So you can probably revert the old version out of
> linux-next and apply this one. All of my development at this point is
> against v4.9-rc1.
>
> I suspect you will find my last version on top of against v4.9-rc1 will
> pass. Since my tree is only one deep and I don't think anyone except
> linux-next is based on it, I plan to drop and readd this patch.
> Especially since it is candidate for backporting.

Mind if I add your tested-by?

To see Linus's tree fail with my patch you can apply the patch below.
That is the essence of what I changed to fix things. Just ignoring
dumpable when an mm exists.

Eric

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 44a25a1e6e83..b53983ee3f03 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -272,7 +272,7 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
ok:
rcu_read_unlock();
mm = task->mm;
- if (mm &&
+ if (!mm ||
((get_dumpable(mm) != SUID_DUMP_USER) &&
!ptrace_has_cap(mm->user_ns, mode)))
return -EPERM;

2016-10-28 07:06:41

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [REVIEW][PATCH v2] mm: Add a user_ns owner to mm_struct and fix ptrace permission checks

On Thu, Oct 27, 2016 at 11:45:37PM -0500, Eric W. Biederman wrote:
>
> Mind if I add your tested-by?
>
> To see Linus's tree fail with my patch you can apply the patch below.
> That is the essence of what I changed to fix things. Just ignoring
> dumpable when an mm exists.

Tested-by: Cyrill Gorcunov <[email protected]>

Thanks a huge!