2014-04-02 17:21:05

by Serge Hallyn

[permalink] [raw]
Subject: Re: [lxc-devel] Kernel bug? Setuid apps and user namespaces

Hi Eric,

(sorry, I don't seem to have the email I actually wanted to reply
to in my mbox, but it is
https://lists.linuxcontainers.org/pipermail/lxc-devel/2013-October/005857.html)

You'd said,
> Someone needs to read and think through all of the corner cases and see
> if we can ever have a time when task_dumpable is false but root in the
> container would not or should not be able to see everything.
>
> In particular I am worried about the case of a setuid app calling setns,
> and entering a lesser privileged user namespace. In my foggy mind that
> might be a security problem. And there might be other similar crazy
> cases.

Can we make use of current->mm->exe_file->f_cred->user_ns?

So either always use
make_kgid(current->mm->exe_file->f_cred->user_ns, 0)
instead of make_kuid(cred->user_ns, 0), or check that
(current->mm->exe_file->f_cred->user_ns == cred->user_ns)
and, if not, assume that the caller has done a setns?

-serge


2014-04-02 17:32:51

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [lxc-devel] Kernel bug? Setuid apps and user namespaces

(Sorry - the lxc-devel list has moved, so replying to all with the
correct list address; please reply to this rather than my previous
email)

Quoting Serge Hallyn ([email protected]):
> Hi Eric,
>
> (sorry, I don't seem to have the email I actually wanted to reply
> to in my mbox, but it is
> https://lists.linuxcontainers.org/pipermail/lxc-devel/2013-October/005857.html)
>
> You'd said,
> > Someone needs to read and think through all of the corner cases and see
> > if we can ever have a time when task_dumpable is false but root in the
> > container would not or should not be able to see everything.
> >
> > In particular I am worried about the case of a setuid app calling setns,
> > and entering a lesser privileged user namespace. In my foggy mind that
> > might be a security problem. And there might be other similar crazy
> > cases.
>
> Can we make use of current->mm->exe_file->f_cred->user_ns?
>
> So either always use
> make_kgid(current->mm->exe_file->f_cred->user_ns, 0)
> instead of make_kuid(cred->user_ns, 0), or check that
> (current->mm->exe_file->f_cred->user_ns == cred->user_ns)
> and, if not, assume that the caller has done a setns?
>
> -serge
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2014-04-04 18:13:54

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [lxc-devel] Kernel bug? Setuid apps and user namespaces

On 04/02/2014 10:32 AM, Serge E. Hallyn wrote:
> (Sorry - the lxc-devel list has moved, so replying to all with the
> correct list address; please reply to this rather than my previous
> email)
>
> Quoting Serge Hallyn ([email protected]):
>> Hi Eric,
>>
>> (sorry, I don't seem to have the email I actually wanted to reply
>> to in my mbox, but it is
>> https://lists.linuxcontainers.org/pipermail/lxc-devel/2013-October/005857.html)
>>
>> You'd said,
>>> Someone needs to read and think through all of the corner cases and see
>>> if we can ever have a time when task_dumpable is false but root in the
>>> container would not or should not be able to see everything.
>>>
>>> In particular I am worried about the case of a setuid app calling setns,
>>> and entering a lesser privileged user namespace. In my foggy mind that
>>> might be a security problem. And there might be other similar crazy
>>> cases.
>>
>> Can we make use of current->mm->exe_file->f_cred->user_ns?
>>
>> So either always use
>> make_kgid(current->mm->exe_file->f_cred->user_ns, 0)
>> instead of make_kuid(cred->user_ns, 0), or check that
>> (current->mm->exe_file->f_cred->user_ns == cred->user_ns)
>> and, if not, assume that the caller has done a setns?

Do you have a summary of the issue? I'm a little lost here.

I suspect that what we really need is to revoke a bunch of proc files
every time a task does anything involving setuid (or, more generally,
any of the LSM_UNSAFE_PTRACE things).

--Andy

2014-04-04 18:30:37

by Serge Hallyn

[permalink] [raw]
Subject: Re: [lxc-devel] Kernel bug? Setuid apps and user namespaces

Quoting Andy Lutomirski ([email protected]):
> On 04/02/2014 10:32 AM, Serge E. Hallyn wrote:
> > (Sorry - the lxc-devel list has moved, so replying to all with the
> > correct list address; please reply to this rather than my previous
> > email)
> >
> > Quoting Serge Hallyn ([email protected]):
> >> Hi Eric,
> >>
> >> (sorry, I don't seem to have the email I actually wanted to reply
> >> to in my mbox, but it is
> >> https://lists.linuxcontainers.org/pipermail/lxc-devel/2013-October/005857.html)
> >>
> >> You'd said,
> >>> Someone needs to read and think through all of the corner cases and see
> >>> if we can ever have a time when task_dumpable is false but root in the
> >>> container would not or should not be able to see everything.
> >>>
> >>> In particular I am worried about the case of a setuid app calling setns,
> >>> and entering a lesser privileged user namespace. In my foggy mind that
> >>> might be a security problem. And there might be other similar crazy
> >>> cases.
> >>
> >> Can we make use of current->mm->exe_file->f_cred->user_ns?
> >>
> >> So either always use
> >> make_kgid(current->mm->exe_file->f_cred->user_ns, 0)
> >> instead of make_kuid(cred->user_ns, 0), or check that
> >> (current->mm->exe_file->f_cred->user_ns == cred->user_ns)
> >> and, if not, assume that the caller has done a setns?
>
> Do you have a summary of the issue? I'm a little lost here.

Sure - when running an unprivileged container, tasks which become
!dumpable end up with /proc/$pid/fd/ being owned by the global
root user, which inside the container is nobody:nogroup. Examples
are the user's sshd threads and apache, and in the past I think I've
seen it with logind or getty too.

> I suspect that what we really need is to revoke a bunch of proc files
> every time a task does anything involving setuid (or, more generally,
> any of the LSM_UNSAFE_PTRACE things).

setuid, or do you mean setns? In any case, I'm not thinking through
attach (setns'ing into a container) yet, but the cases I'm looking at
right now are just a root daemon - already inside the non-init user
ns - doing something to become !dumpable, and having its fds become
owned by GLOBAL_ROOT_UID. Since these tasks are running a program
which came from inside the non-init userns, I think it's sane to
allow root in the non-init userns own any coredumps.

Whereas if the program had started as /bin/passwd in the init userns,
then coredumps (and /proc/$$/fd/*) should be owned by the GLOBAL_ROOT_UID.

-serge

2014-04-04 19:03:24

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [lxc-devel] Kernel bug? Setuid apps and user namespaces

On Fri, Apr 4, 2014 at 11:30 AM, Serge Hallyn <[email protected]> wrote:
> Quoting Andy Lutomirski ([email protected]):
>> On 04/02/2014 10:32 AM, Serge E. Hallyn wrote:
>> > (Sorry - the lxc-devel list has moved, so replying to all with the
>> > correct list address; please reply to this rather than my previous
>> > email)
>> >
>> > Quoting Serge Hallyn ([email protected]):
>> >> Hi Eric,
>> >>
>> >> (sorry, I don't seem to have the email I actually wanted to reply
>> >> to in my mbox, but it is
>> >> https://lists.linuxcontainers.org/pipermail/lxc-devel/2013-October/005857.html)
>> >>
>> >> You'd said,
>> >>> Someone needs to read and think through all of the corner cases and see
>> >>> if we can ever have a time when task_dumpable is false but root in the
>> >>> container would not or should not be able to see everything.
>> >>>
>> >>> In particular I am worried about the case of a setuid app calling setns,
>> >>> and entering a lesser privileged user namespace. In my foggy mind that
>> >>> might be a security problem. And there might be other similar crazy
>> >>> cases.
>> >>
>> >> Can we make use of current->mm->exe_file->f_cred->user_ns?
>> >>
>> >> So either always use
>> >> make_kgid(current->mm->exe_file->f_cred->user_ns, 0)
>> >> instead of make_kuid(cred->user_ns, 0), or check that
>> >> (current->mm->exe_file->f_cred->user_ns == cred->user_ns)
>> >> and, if not, assume that the caller has done a setns?
>>
>> Do you have a summary of the issue? I'm a little lost here.
>
> Sure - when running an unprivileged container, tasks which become
> !dumpable end up with /proc/$pid/fd/ being owned by the global
> root user, which inside the container is nobody:nogroup. Examples
> are the user's sshd threads and apache, and in the past I think I've
> seen it with logind or getty too.

Other than the aesthetics, why does this matter? Things in the
container who are actually mapped to nobody still can't access those
files?

The alternative (using the container's owner) sounds a bit scary.

>
>> I suspect that what we really need is to revoke a bunch of proc files
>> every time a task does anything involving setuid (or, more generally,
>> any of the LSM_UNSAFE_PTRACE things).
>
> setuid, or do you mean setns? In any case, I'm not thinking through
> attach (setns'ing into a container) yet, but the cases I'm looking at
> right now are just a root daemon - already inside the non-init user
> ns - doing something to become !dumpable, and having its fds become
> owned by GLOBAL_ROOT_UID. Since these tasks are running a program
> which came from inside the non-init userns, I think it's sane to
> allow root in the non-init userns own any coredumps.
>
> Whereas if the program had started as /bin/passwd in the init userns,
> then coredumps (and /proc/$$/fd/*) should be owned by the GLOBAL_ROOT_UID.

Gack.

This is kind of the same problem as the ptrace issue in the credfd
thread. Sigh.

--Andy

2014-04-04 19:10:16

by Serge Hallyn

[permalink] [raw]
Subject: Re: [lxc-devel] Kernel bug? Setuid apps and user namespaces

Quoting Andy Lutomirski ([email protected]):
> On Fri, Apr 4, 2014 at 11:30 AM, Serge Hallyn <[email protected]> wrote:
> > Quoting Andy Lutomirski ([email protected]):
> >> On 04/02/2014 10:32 AM, Serge E. Hallyn wrote:
> >> > (Sorry - the lxc-devel list has moved, so replying to all with the
> >> > correct list address; please reply to this rather than my previous
> >> > email)
> >> >
> >> > Quoting Serge Hallyn ([email protected]):
> >> >> Hi Eric,
> >> >>
> >> >> (sorry, I don't seem to have the email I actually wanted to reply
> >> >> to in my mbox, but it is
> >> >> https://lists.linuxcontainers.org/pipermail/lxc-devel/2013-October/005857.html)
> >> >>
> >> >> You'd said,
> >> >>> Someone needs to read and think through all of the corner cases and see
> >> >>> if we can ever have a time when task_dumpable is false but root in the
> >> >>> container would not or should not be able to see everything.
> >> >>>
> >> >>> In particular I am worried about the case of a setuid app calling setns,
> >> >>> and entering a lesser privileged user namespace. In my foggy mind that
> >> >>> might be a security problem. And there might be other similar crazy
> >> >>> cases.
> >> >>
> >> >> Can we make use of current->mm->exe_file->f_cred->user_ns?
> >> >>
> >> >> So either always use
> >> >> make_kgid(current->mm->exe_file->f_cred->user_ns, 0)
> >> >> instead of make_kuid(cred->user_ns, 0), or check that
> >> >> (current->mm->exe_file->f_cred->user_ns == cred->user_ns)
> >> >> and, if not, assume that the caller has done a setns?
> >>
> >> Do you have a summary of the issue? I'm a little lost here.
> >
> > Sure - when running an unprivileged container, tasks which become
> > !dumpable end up with /proc/$pid/fd/ being owned by the global
> > root user, which inside the container is nobody:nogroup. Examples
> > are the user's sshd threads and apache, and in the past I think I've
> > seen it with logind or getty too.
>
> Other than the aesthetics, why does this matter? Things in the
> container who are actually mapped to nobody still can't access those
> files?

Bc root cannot look at the fds.

> The alternative (using the container's owner) sounds a bit scary.

If the file being run belongs to the container, why would it be scary?
Bc some fds may have been not closed when the task did execve, where
the previous bprm file may have been on the host?

> >> I suspect that what we really need is to revoke a bunch of proc files
> >> every time a task does anything involving setuid (or, more generally,
> >> any of the LSM_UNSAFE_PTRACE things).
> >
> > setuid, or do you mean setns? In any case, I'm not thinking through
> > attach (setns'ing into a container) yet, but the cases I'm looking at
> > right now are just a root daemon - already inside the non-init user
> > ns - doing something to become !dumpable, and having its fds become
> > owned by GLOBAL_ROOT_UID. Since these tasks are running a program
> > which came from inside the non-init userns, I think it's sane to
> > allow root in the non-init userns own any coredumps.
> >
> > Whereas if the program had started as /bin/passwd in the init userns,
> > then coredumps (and /proc/$$/fd/*) should be owned by the GLOBAL_ROOT_UID.
>
> Gack.
>
> This is kind of the same problem as the ptrace issue in the credfd
> thread. Sigh.
>
> --Andy

2014-04-04 19:28:33

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [lxc-devel] Kernel bug? Setuid apps and user namespaces

On Fri, Apr 4, 2014 at 12:10 PM, Serge Hallyn <[email protected]> wrote:
> Quoting Andy Lutomirski ([email protected]):
>> On Fri, Apr 4, 2014 at 11:30 AM, Serge Hallyn <[email protected]> wrote:
>> > Quoting Andy Lutomirski ([email protected]):
>> >> On 04/02/2014 10:32 AM, Serge E. Hallyn wrote:
>> >> > (Sorry - the lxc-devel list has moved, so replying to all with the
>> >> > correct list address; please reply to this rather than my previous
>> >> > email)
>> >> >
>> >> > Quoting Serge Hallyn ([email protected]):
>> >> >> Hi Eric,
>> >> >>
>> >> >> (sorry, I don't seem to have the email I actually wanted to reply
>> >> >> to in my mbox, but it is
>> >> >> https://lists.linuxcontainers.org/pipermail/lxc-devel/2013-October/005857.html)
>> >> >>
>> >> >> You'd said,
>> >> >>> Someone needs to read and think through all of the corner cases and see
>> >> >>> if we can ever have a time when task_dumpable is false but root in the
>> >> >>> container would not or should not be able to see everything.
>> >> >>>
>> >> >>> In particular I am worried about the case of a setuid app calling setns,
>> >> >>> and entering a lesser privileged user namespace. In my foggy mind that
>> >> >>> might be a security problem. And there might be other similar crazy
>> >> >>> cases.
>> >> >>
>> >> >> Can we make use of current->mm->exe_file->f_cred->user_ns?
>> >> >>
>> >> >> So either always use
>> >> >> make_kgid(current->mm->exe_file->f_cred->user_ns, 0)
>> >> >> instead of make_kuid(cred->user_ns, 0), or check that
>> >> >> (current->mm->exe_file->f_cred->user_ns == cred->user_ns)
>> >> >> and, if not, assume that the caller has done a setns?
>> >>
>> >> Do you have a summary of the issue? I'm a little lost here.
>> >
>> > Sure - when running an unprivileged container, tasks which become
>> > !dumpable end up with /proc/$pid/fd/ being owned by the global
>> > root user, which inside the container is nobody:nogroup. Examples
>> > are the user's sshd threads and apache, and in the past I think I've
>> > seen it with logind or getty too.
>>
>> Other than the aesthetics, why does this matter? Things in the
>> container who are actually mapped to nobody still can't access those
>> files?
>
> Bc root cannot look at the fds.

Right. I guess this is a problem.

>
>> The alternative (using the container's owner) sounds a bit scary.
>
> If the file being run belongs to the container, why would it be scary?
> Bc some fds may have been not closed when the task did execve, where
> the previous bprm file may have been on the host?

Meh. I'm not worried about that case, and that one probably doesn't
cause !dumpable anyway. The nasty cases are unshare and setns.

I'm starting to think that we need to extend dumpable to something
much more general like a list of struct creds that someone needs to be
able to ptrace, *in addition to current creds* in order to access
sensitive /proc files, coredumps, etc. If you get started as setuid,
then you start with two struct creds in the list (or maybe just your
euid and uid). If you get started !setuid, then your initial creds
are in the list. It's possible that few or no things will need to
change that list after execve.

If all of the entries and current->cred are in the same user_ns, then
we can dump as userns root. If they're in different usernses, then we
dump as global root or maybe the common ancestor root.
setuid(getuid()) and other such nastiness may have to empty the list,
or maybe we can just use a prctl for that.

If this idea works, it would be straightforward to implement, it might
solve a number of problems.

--Andy

2014-04-07 18:13:34

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [lxc-devel] Kernel bug? Setuid apps and user namespaces

Quoting Andy Lutomirski ([email protected]):
> On Fri, Apr 4, 2014 at 12:10 PM, Serge Hallyn <[email protected]> wrote:
> > Quoting Andy Lutomirski ([email protected]):
> >> On Fri, Apr 4, 2014 at 11:30 AM, Serge Hallyn <[email protected]> wrote:
> >> > Quoting Andy Lutomirski ([email protected]):
> >> >> On 04/02/2014 10:32 AM, Serge E. Hallyn wrote:
> >> >> > (Sorry - the lxc-devel list has moved, so replying to all with the
> >> >> > correct list address; please reply to this rather than my previous
> >> >> > email)
> >> >> >
> >> >> > Quoting Serge Hallyn ([email protected]):
> >> >> >> Hi Eric,
> >> >> >>
> >> >> >> (sorry, I don't seem to have the email I actually wanted to reply
> >> >> >> to in my mbox, but it is
> >> >> >> https://lists.linuxcontainers.org/pipermail/lxc-devel/2013-October/005857.html)
> >> >> >>
> >> >> >> You'd said,
> >> >> >>> Someone needs to read and think through all of the corner cases and see
> >> >> >>> if we can ever have a time when task_dumpable is false but root in the
> >> >> >>> container would not or should not be able to see everything.
> >> >> >>>
> >> >> >>> In particular I am worried about the case of a setuid app calling setns,
> >> >> >>> and entering a lesser privileged user namespace. In my foggy mind that
> >> >> >>> might be a security problem. And there might be other similar crazy
> >> >> >>> cases.
> >> >> >>
> >> >> >> Can we make use of current->mm->exe_file->f_cred->user_ns?
> >> >> >>
> >> >> >> So either always use
> >> >> >> make_kgid(current->mm->exe_file->f_cred->user_ns, 0)
> >> >> >> instead of make_kuid(cred->user_ns, 0), or check that
> >> >> >> (current->mm->exe_file->f_cred->user_ns == cred->user_ns)
> >> >> >> and, if not, assume that the caller has done a setns?
> >> >>
> >> >> Do you have a summary of the issue? I'm a little lost here.
> >> >
> >> > Sure - when running an unprivileged container, tasks which become
> >> > !dumpable end up with /proc/$pid/fd/ being owned by the global
> >> > root user, which inside the container is nobody:nogroup. Examples
> >> > are the user's sshd threads and apache, and in the past I think I've
> >> > seen it with logind or getty too.
> >>
> >> Other than the aesthetics, why does this matter? Things in the
> >> container who are actually mapped to nobody still can't access those
> >> files?
> >
> > Bc root cannot look at the fds.
>
> Right. I guess this is a problem.
>
> >
> >> The alternative (using the container's owner) sounds a bit scary.
> >
> > If the file being run belongs to the container, why would it be scary?
> > Bc some fds may have been not closed when the task did execve, where
> > the previous bprm file may have been on the host?
>
> Meh. I'm not worried about that case, and that one probably doesn't
> cause !dumpable anyway. The nasty cases are unshare and setns.
>
> I'm starting to think that we need to extend dumpable to something
> much more general like a list of struct creds that someone needs to be
> able to ptrace, *in addition to current creds* in order to access
> sensitive /proc files, coredumps, etc. If you get started as setuid,

Hm, yeah, this sort of makes sense.

> then you start with two struct creds in the list (or maybe just your
> euid and uid). If you get started !setuid, then your initial creds
> are in the list. It's possible that few or no things will need to
> change that list after execve.
>
> If all of the entries and current->cred are in the same user_ns, then
> we can dump as userns root. If they're in different usernses, then we
> dump as global root or maybe the common ancestor root.
> setuid(getuid()) and other such nastiness may have to empty the list,
> or maybe we can just use a prctl for that.

A few questions,

1. is there any other action which would trigger adding a new cred to
the ist?

2. would execve clear (and re-init) the list of creds?

> If this idea works, it would be straightforward to implement, it might
> solve a number of problems.
>
> --Andy

2014-04-10 19:51:08

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [lxc-devel] Kernel bug? Setuid apps and user namespaces

On Mon, Apr 7, 2014 at 11:13 AM, Serge E. Hallyn <[email protected]> wrote:
> Quoting Andy Lutomirski ([email protected]):
>> I'm starting to think that we need to extend dumpable to something
>> much more general like a list of struct creds that someone needs to be
>> able to ptrace, *in addition to current creds* in order to access
>> sensitive /proc files, coredumps, etc. If you get started as setuid,
>
> Hm, yeah, this sort of makes sense.
>
>> then you start with two struct creds in the list (or maybe just your
>> euid and uid). If you get started !setuid, then your initial creds
>> are in the list. It's possible that few or no things will need to
>> change that list after execve.
>>
>> If all of the entries and current->cred are in the same user_ns, then
>> we can dump as userns root. If they're in different usernses, then we
>> dump as global root or maybe the common ancestor root.
>> setuid(getuid()) and other such nastiness may have to empty the list,
>> or maybe we can just use a prctl for that.
>
> A few questions,
>
> 1. is there any other action which would trigger adding a new cred to
> the ist?

I don't think so. Anyone who can ptrace you from the start can
corrupt you such that you leak rights even if some future action
prevents new ptracers from attaching.

OTOH, it might be nice for something like an HTTPS server to be able
to fork and shove its private key into the child, while preventing
anyone from ptracing the child. But doing this securely without help
from someone with a different uid might be impossible anyway.

>
> 2. would execve clear (and re-init) the list of creds?

Probably. Thoughts?

We could have a way to ask execve not to reinit the list. Such a
mechanism would have to require no_new_privs to prevent a
straightforward attack on any setuid binary.

We's also want PR_SET_DUMPABLE or a new prctl to be able reset the
list to contain just current-.cred, I think.

--Andy

2014-04-11 21:52:31

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [lxc-devel] Kernel bug? Setuid apps and user namespaces

Quoting Andy Lutomirski ([email protected]):
> On Mon, Apr 7, 2014 at 11:13 AM, Serge E. Hallyn <[email protected]> wrote:
> > Quoting Andy Lutomirski ([email protected]):
> >> I'm starting to think that we need to extend dumpable to something
> >> much more general like a list of struct creds that someone needs to be
> >> able to ptrace, *in addition to current creds* in order to access
> >> sensitive /proc files, coredumps, etc. If you get started as setuid,
> >
> > Hm, yeah, this sort of makes sense.
> >
> >> then you start with two struct creds in the list (or maybe just your
> >> euid and uid). If you get started !setuid, then your initial creds
> >> are in the list. It's possible that few or no things will need to
> >> change that list after execve.
> >>
> >> If all of the entries and current->cred are in the same user_ns, then
> >> we can dump as userns root. If they're in different usernses, then we
> >> dump as global root or maybe the common ancestor root.
> >> setuid(getuid()) and other such nastiness may have to empty the list,
> >> or maybe we can just use a prctl for that.
> >
> > A few questions,
> >
> > 1. is there any other action which would trigger adding a new cred to
> > the ist?
>
> I don't think so. Anyone who can ptrace you from the start can
> corrupt you such that you leak rights even if some future action
> prevents new ptracers from attaching.
>
> OTOH, it might be nice for something like an HTTPS server to be able
> to fork and shove its private key into the child, while preventing
> anyone from ptracing the child. But doing this securely without help
> from someone with a different uid might be impossible anyway.
>
> >
> > 2. would execve clear (and re-init) the list of creds?
>
> Probably. Thoughts?

Yeah it seems to me it should be re-initialized, with a cred added
to the list for every open fd.

> We could have a way to ask execve not to reinit the list. Such a
> mechanism would have to require no_new_privs to prevent a
> straightforward attack on any setuid binary.

If we don't add a cred for every open fd, then I think we need this. If
we do, then I'm not sure this makes sense.

> We's also want PR_SET_DUMPABLE or a new prctl to be able reset the
> list to contain just current-.cred, I think.

Yeah PR_SET_DUMPABLE should reset it, agreed.

-serge

2014-04-11 22:11:58

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [lxc-devel] Kernel bug? Setuid apps and user namespaces

On Fri, Apr 11, 2014 at 2:52 PM, Serge E. Hallyn <[email protected]> wrote:
> Quoting Andy Lutomirski ([email protected]):
>> On Mon, Apr 7, 2014 at 11:13 AM, Serge E. Hallyn <[email protected]> wrote:
>> > Quoting Andy Lutomirski ([email protected]):
>> >> I'm starting to think that we need to extend dumpable to something
>> >> much more general like a list of struct creds that someone needs to be
>> >> able to ptrace, *in addition to current creds* in order to access
>> >> sensitive /proc files, coredumps, etc. If you get started as setuid,
>> >
>> > Hm, yeah, this sort of makes sense.
>> >
>> >> then you start with two struct creds in the list (or maybe just your
>> >> euid and uid). If you get started !setuid, then your initial creds
>> >> are in the list. It's possible that few or no things will need to
>> >> change that list after execve.
>> >>
>> >> If all of the entries and current->cred are in the same user_ns, then
>> >> we can dump as userns root. If they're in different usernses, then we
>> >> dump as global root or maybe the common ancestor root.
>> >> setuid(getuid()) and other such nastiness may have to empty the list,
>> >> or maybe we can just use a prctl for that.
>> >
>> > A few questions,
>> >
>> > 1. is there any other action which would trigger adding a new cred to
>> > the ist?
>>
>> I don't think so. Anyone who can ptrace you from the start can
>> corrupt you such that you leak rights even if some future action
>> prevents new ptracers from attaching.
>>
>> OTOH, it might be nice for something like an HTTPS server to be able
>> to fork and shove its private key into the child, while preventing
>> anyone from ptracing the child. But doing this securely without help
>> from someone with a different uid might be impossible anyway.
>>
>> >
>> > 2. would execve clear (and re-init) the list of creds?
>>
>> Probably. Thoughts?
>
> Yeah it seems to me it should be re-initialized, with a cred added
> to the list for every open fd.

What do you mean "every fd"?

It seems odd to me that execve of anything that isn't setuid would add
anything to the list -- attackers can always ptrace before the execve
happens.

>
>> We could have a way to ask execve not to reinit the list. Such a
>> mechanism would have to require no_new_privs to prevent a
>> straightforward attack on any setuid binary.
>
> If we don't add a cred for every open fd, then I think we need this. If
> we do, then I'm not sure this makes sense.

See above. I think I'm misunderstanding you.

>
>> We's also want PR_SET_DUMPABLE or a new prctl to be able reset the
>> list to contain just current-.cred, I think.
>
> Yeah PR_SET_DUMPABLE should reset it, agreed.
>
> -serge

--Andy

2014-04-11 22:29:27

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [lxc-devel] Kernel bug? Setuid apps and user namespaces

Quoting Andy Lutomirski ([email protected]):
> On Fri, Apr 11, 2014 at 2:52 PM, Serge E. Hallyn <[email protected]> wrote:
> > Quoting Andy Lutomirski ([email protected]):
> >> On Mon, Apr 7, 2014 at 11:13 AM, Serge E. Hallyn <[email protected]> wrote:
> >> > Quoting Andy Lutomirski ([email protected]):
> >> >> I'm starting to think that we need to extend dumpable to something
> >> >> much more general like a list of struct creds that someone needs to be
> >> >> able to ptrace, *in addition to current creds* in order to access
> >> >> sensitive /proc files, coredumps, etc. If you get started as setuid,
> >> >
> >> > Hm, yeah, this sort of makes sense.
> >> >
> >> >> then you start with two struct creds in the list (or maybe just your
> >> >> euid and uid). If you get started !setuid, then your initial creds
> >> >> are in the list. It's possible that few or no things will need to
> >> >> change that list after execve.
> >> >>
> >> >> If all of the entries and current->cred are in the same user_ns, then
> >> >> we can dump as userns root. If they're in different usernses, then we
> >> >> dump as global root or maybe the common ancestor root.
> >> >> setuid(getuid()) and other such nastiness may have to empty the list,
> >> >> or maybe we can just use a prctl for that.
> >> >
> >> > A few questions,
> >> >
> >> > 1. is there any other action which would trigger adding a new cred to
> >> > the ist?
> >>
> >> I don't think so. Anyone who can ptrace you from the start can
> >> corrupt you such that you leak rights even if some future action
> >> prevents new ptracers from attaching.
> >>
> >> OTOH, it might be nice for something like an HTTPS server to be able
> >> to fork and shove its private key into the child, while preventing
> >> anyone from ptracing the child. But doing this securely without help
> >> from someone with a different uid might be impossible anyway.
> >>
> >> >
> >> > 2. would execve clear (and re-init) the list of creds?
> >>
> >> Probably. Thoughts?
> >
> > Yeah it seems to me it should be re-initialized, with a cred added
> > to the list for every open fd.
>
> What do you mean "every fd"?
>
> It seems odd to me that execve of anything that isn't setuid would add
> anything to the list -- attackers can always ptrace before the execve
> happens.

Maybe you're right. Maybe I shouldn't reason about this on a friday
afternoon.

My *thought* was setuid-root program opens /etc/shadow, then execs a
regular program keeping that open. Attaching to that fails now though,
presumably due to dumpable.

-serge

2014-04-11 22:32:56

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [lxc-devel] Kernel bug? Setuid apps and user namespaces

On Fri, Apr 11, 2014 at 3:29 PM, Serge E. Hallyn <[email protected]> wrote:
> Quoting Andy Lutomirski ([email protected]):
>> On Fri, Apr 11, 2014 at 2:52 PM, Serge E. Hallyn <[email protected]> wrote:
>> > Quoting Andy Lutomirski ([email protected]):
>> >> On Mon, Apr 7, 2014 at 11:13 AM, Serge E. Hallyn <[email protected]> wrote:
>> >> > Quoting Andy Lutomirski ([email protected]):
>> >> >> I'm starting to think that we need to extend dumpable to something
>> >> >> much more general like a list of struct creds that someone needs to be
>> >> >> able to ptrace, *in addition to current creds* in order to access
>> >> >> sensitive /proc files, coredumps, etc. If you get started as setuid,
>> >> >
>> >> > Hm, yeah, this sort of makes sense.
>> >> >
>> >> >> then you start with two struct creds in the list (or maybe just your
>> >> >> euid and uid). If you get started !setuid, then your initial creds
>> >> >> are in the list. It's possible that few or no things will need to
>> >> >> change that list after execve.
>> >> >>
>> >> >> If all of the entries and current->cred are in the same user_ns, then
>> >> >> we can dump as userns root. If they're in different usernses, then we
>> >> >> dump as global root or maybe the common ancestor root.
>> >> >> setuid(getuid()) and other such nastiness may have to empty the list,
>> >> >> or maybe we can just use a prctl for that.
>> >> >
>> >> > A few questions,
>> >> >
>> >> > 1. is there any other action which would trigger adding a new cred to
>> >> > the ist?
>> >>
>> >> I don't think so. Anyone who can ptrace you from the start can
>> >> corrupt you such that you leak rights even if some future action
>> >> prevents new ptracers from attaching.
>> >>
>> >> OTOH, it might be nice for something like an HTTPS server to be able
>> >> to fork and shove its private key into the child, while preventing
>> >> anyone from ptracing the child. But doing this securely without help
>> >> from someone with a different uid might be impossible anyway.
>> >>
>> >> >
>> >> > 2. would execve clear (and re-init) the list of creds?
>> >>
>> >> Probably. Thoughts?
>> >
>> > Yeah it seems to me it should be re-initialized, with a cred added
>> > to the list for every open fd.
>>
>> What do you mean "every fd"?
>>
>> It seems odd to me that execve of anything that isn't setuid would add
>> anything to the list -- attackers can always ptrace before the execve
>> happens.
>
> Maybe you're right. Maybe I shouldn't reason about this on a friday
> afternoon.
>
> My *thought* was setuid-root program opens /etc/shadow, then execs a
> regular program keeping that open. Attaching to that fails now though,
> presumably due to dumpable.
>

Why would it fail? Isn't dumpable cleared on execve of a non-setuid
program? Maybe I need to look this stuff up again.

--Andy

2014-04-11 22:46:41

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [lxc-devel] Kernel bug? Setuid apps and user namespaces

Quoting Andy Lutomirski ([email protected]):
> On Fri, Apr 11, 2014 at 3:29 PM, Serge E. Hallyn <[email protected]> wrote:
> > Quoting Andy Lutomirski ([email protected]):
> >> On Fri, Apr 11, 2014 at 2:52 PM, Serge E. Hallyn <[email protected]> wrote:
> >> > Quoting Andy Lutomirski ([email protected]):
> >> >> On Mon, Apr 7, 2014 at 11:13 AM, Serge E. Hallyn <[email protected]> wrote:
> >> >> > Quoting Andy Lutomirski ([email protected]):
> >> >> >> I'm starting to think that we need to extend dumpable to something
> >> >> >> much more general like a list of struct creds that someone needs to be
> >> >> >> able to ptrace, *in addition to current creds* in order to access
> >> >> >> sensitive /proc files, coredumps, etc. If you get started as setuid,
> >> >> >
> >> >> > Hm, yeah, this sort of makes sense.
> >> >> >
> >> >> >> then you start with two struct creds in the list (or maybe just your
> >> >> >> euid and uid). If you get started !setuid, then your initial creds
> >> >> >> are in the list. It's possible that few or no things will need to
> >> >> >> change that list after execve.
> >> >> >>
> >> >> >> If all of the entries and current->cred are in the same user_ns, then
> >> >> >> we can dump as userns root. If they're in different usernses, then we
> >> >> >> dump as global root or maybe the common ancestor root.
> >> >> >> setuid(getuid()) and other such nastiness may have to empty the list,
> >> >> >> or maybe we can just use a prctl for that.
> >> >> >
> >> >> > A few questions,
> >> >> >
> >> >> > 1. is there any other action which would trigger adding a new cred to
> >> >> > the ist?
> >> >>
> >> >> I don't think so. Anyone who can ptrace you from the start can
> >> >> corrupt you such that you leak rights even if some future action
> >> >> prevents new ptracers from attaching.
> >> >>
> >> >> OTOH, it might be nice for something like an HTTPS server to be able
> >> >> to fork and shove its private key into the child, while preventing
> >> >> anyone from ptracing the child. But doing this securely without help
> >> >> from someone with a different uid might be impossible anyway.
> >> >>
> >> >> >
> >> >> > 2. would execve clear (and re-init) the list of creds?
> >> >>
> >> >> Probably. Thoughts?
> >> >
> >> > Yeah it seems to me it should be re-initialized, with a cred added
> >> > to the list for every open fd.
> >>
> >> What do you mean "every fd"?
> >>
> >> It seems odd to me that execve of anything that isn't setuid would add
> >> anything to the list -- attackers can always ptrace before the execve
> >> happens.
> >
> > Maybe you're right. Maybe I shouldn't reason about this on a friday
> > afternoon.
> >
> > My *thought* was setuid-root program opens /etc/shadow, then execs a
> > regular program keeping that open. Attaching to that fails now though,
> > presumably due to dumpable.
> >
>
> Why would it fail?

I had expected it to succeed when I tried it, but it did in fact fail.

> Isn't dumpable cleared on execve of a non-setuid
> program? Maybe I need to look this stuff up again.

I guess this particular case was handled by setup_new_exec:

if (uid_eq(current_euid(), current_uid()) && gid_eq(current_egid(), current_gid()))
set_dumpable(current->mm, SUID_DUMP_USER);
else
set_dumpable(current->mm, suid_dumpable);

since my euid was 0 and uid 1000, when I did the exec.

-serge

2014-04-11 23:00:39

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [lxc-devel] Kernel bug? Setuid apps and user namespaces

On Fri, Apr 11, 2014 at 3:46 PM, Serge E. Hallyn <[email protected]> wrote:
> Quoting Andy Lutomirski ([email protected]):
>> On Fri, Apr 11, 2014 at 3:29 PM, Serge E. Hallyn <[email protected]> wrote:
>> > Quoting Andy Lutomirski ([email protected]):
>> >> On Fri, Apr 11, 2014 at 2:52 PM, Serge E. Hallyn <[email protected]> wrote:
>> >> > Quoting Andy Lutomirski ([email protected]):
>> >> >> On Mon, Apr 7, 2014 at 11:13 AM, Serge E. Hallyn <[email protected]> wrote:
>> >> >> > Quoting Andy Lutomirski ([email protected]):
>> >> >> >> I'm starting to think that we need to extend dumpable to something
>> >> >> >> much more general like a list of struct creds that someone needs to be
>> >> >> >> able to ptrace, *in addition to current creds* in order to access
>> >> >> >> sensitive /proc files, coredumps, etc. If you get started as setuid,
>> >> >> >
>> >> >> > Hm, yeah, this sort of makes sense.
>> >> >> >
>> >> >> >> then you start with two struct creds in the list (or maybe just your
>> >> >> >> euid and uid). If you get started !setuid, then your initial creds
>> >> >> >> are in the list. It's possible that few or no things will need to
>> >> >> >> change that list after execve.
>> >> >> >>
>> >> >> >> If all of the entries and current->cred are in the same user_ns, then
>> >> >> >> we can dump as userns root. If they're in different usernses, then we
>> >> >> >> dump as global root or maybe the common ancestor root.
>> >> >> >> setuid(getuid()) and other such nastiness may have to empty the list,
>> >> >> >> or maybe we can just use a prctl for that.
>> >> >> >
>> >> >> > A few questions,
>> >> >> >
>> >> >> > 1. is there any other action which would trigger adding a new cred to
>> >> >> > the ist?
>> >> >>
>> >> >> I don't think so. Anyone who can ptrace you from the start can
>> >> >> corrupt you such that you leak rights even if some future action
>> >> >> prevents new ptracers from attaching.
>> >> >>
>> >> >> OTOH, it might be nice for something like an HTTPS server to be able
>> >> >> to fork and shove its private key into the child, while preventing
>> >> >> anyone from ptracing the child. But doing this securely without help
>> >> >> from someone with a different uid might be impossible anyway.
>> >> >>
>> >> >> >
>> >> >> > 2. would execve clear (and re-init) the list of creds?
>> >> >>
>> >> >> Probably. Thoughts?
>> >> >
>> >> > Yeah it seems to me it should be re-initialized, with a cred added
>> >> > to the list for every open fd.
>> >>
>> >> What do you mean "every fd"?
>> >>
>> >> It seems odd to me that execve of anything that isn't setuid would add
>> >> anything to the list -- attackers can always ptrace before the execve
>> >> happens.
>> >
>> > Maybe you're right. Maybe I shouldn't reason about this on a friday
>> > afternoon.
>> >
>> > My *thought* was setuid-root program opens /etc/shadow, then execs a
>> > regular program keeping that open. Attaching to that fails now though,
>> > presumably due to dumpable.
>> >
>>
>> Why would it fail?
>
> I had expected it to succeed when I tried it, but it did in fact fail.
>
>> Isn't dumpable cleared on execve of a non-setuid
>> program? Maybe I need to look this stuff up again.
>
> I guess this particular case was handled by setup_new_exec:
>
> if (uid_eq(current_euid(), current_uid()) && gid_eq(current_egid(), current_gid()))
> set_dumpable(current->mm, SUID_DUMP_USER);
> else
> set_dumpable(current->mm, suid_dumpable);
>
> since my euid was 0 and uid 1000, when I did the exec.
>

Then we need to keep this working. I guess we can just keep the
dumpable bit around.

--Andy