2016-03-08 05:15:51

by Andy Lutomirski

[permalink] [raw]
Subject: Thoughts on tightening up user namespace creation

Hi all-

There are several users and distros that are nervous about user
namespaces from an attack surface point of view.

- RHEL and Arch have userns disabled.

- Ubuntu requires CAP_SYS_ADMIN

- Kees periodically proposes to upstream some sysctl to control
userns creation.

I think there are three main types of concerns. First, there might be
some as-yet-unknown semantic issues that would allow privilege
escalation by users who create user namespaces and then confuse
something else in the system. Second, enabling user namespaces
exposes a lot of attack surface to unprivileged users. Third,
allowing tasks to create user namespaces exposes the kernel to various
resource exhaustion attacks that wouldn't be possible otherwise.

Since I doubt we'll ever fully address the attack surface issue at
least, would it make sense to try to come up with an upstreamable way
to limit who can create new user namespaces and/or do various
dangerous things with them?

I'll divide the rest of the email into the "what" and the "who".

+++ What does the privilege of creating a user namespace entail? +++

This could be an all-or-nothing thing. It would certainly be possible
for appropriately privileged tasks to be able to unshare namespaces
and use their facilities exactly like any task can in a current
user-ns-enabled kernel and for other tasks to be unable to unshare
anything.

Finer gradations are, in principle, possible. For example, it could
be possible for a given task to unshare its userns but to have limited
caps inside or to be unable to unshare certain other namespaces. For
example, maybe a task could unshare userns and mount ns but not net
ns. I don't think this would be particularly useful.

It might be more interesting to allow a task to unshare all
namespaces, hold all capabilities in them, but to still be unable to
use certain privileged facilities. For example, maybe denying
administrative control over iptables, creation of exotic network
interface types, or similar would make sense. I don't know how we'd
specify this type of constraint.

+++ Who can create user namespaces (possibly with restrictions)? +++

I can think of a few formulations.

A simpler approach would be to add a per-namespace setting listing
users and/or groups that can unshare their userns. A userns starts
out allowing everyone to unshare userns, and anyone with CAP_SYS_ADMIN
can change the setting.

A fancier approach would be to have an fd that represents the right to
unshare your userns. Some privilege broker could give out those fds
to apps that need them and meet whatever criteria are set. If you try
to unshare your userns without the fd, it falls back to some simpler
policy.

I think I prefer the simpler one. It's simple, and I haven't come up
with a concrete problem with it yet.




Thoughts?


2016-03-08 06:07:07

by Serge Hallyn

[permalink] [raw]
Subject: Re: Thoughts on tightening up user namespace creation

On Mon, Mar 07, 2016 at 09:15:25PM -0800, Andy Lutomirski wrote:
> Hi all-
>
> There are several users and distros that are nervous about user
> namespaces from an attack surface point of view.
>
> - RHEL and Arch have userns disabled.
>
> - Ubuntu requires CAP_SYS_ADMIN

No, it does not. It has temporarily re-added a sysctl which can enable
that behavior, but it's not set by default. The reason for providing it
is not a distrust of user namespaces in general, but because we're enabling
some bleeding edge patches which haven't been accepted upstream yet. Once
they're accepted upstream I expect that patch to be dropped again, unless
it has gone upstream.

Debian does afaik still have a version of a patch I'd originally written
before user namespaces were upstream which defaulted unprivileged userns
cloning to off. Did you mean Debian here?

> - Kees periodically proposes to upstream some sysctl to control
> userns creation.
>
> I think there are three main types of concerns. First, there might be
> some as-yet-unknown semantic issues that would allow privilege
> escalation by users who create user namespaces and then confuse
> something else in the system. Second, enabling user namespaces
> exposes a lot of attack surface to unprivileged users. Third,
> allowing tasks to create user namespaces exposes the kernel to various
> resource exhaustion attacks that wouldn't be possible otherwise.
>
> Since I doubt we'll ever fully address the attack surface issue at
> least, would it make sense to try to come up with an upstreamable way
> to limit who can create new user namespaces and/or do various
> dangerous things with them?
>
> I'll divide the rest of the email into the "what" and the "who".
>
> +++ What does the privilege of creating a user namespace entail? +++
>
> This could be an all-or-nothing thing. It would certainly be possible
> for appropriately privileged tasks to be able to unshare namespaces
> and use their facilities exactly like any task can in a current
> user-ns-enabled kernel and for other tasks to be unable to unshare
> anything.
>
> Finer gradations are, in principle, possible. For example, it could
> be possible for a given task to unshare its userns but to have limited
> caps inside or to be unable to unshare certain other namespaces. For
> example, maybe a task could unshare userns and mount ns but not net
> ns. I don't think this would be particularly useful.
>
> It might be more interesting to allow a task to unshare all
> namespaces, hold all capabilities in them, but to still be unable to
> use certain privileged facilities. For example, maybe denying
> administrative control over iptables, creation of exotic network
> interface types, or similar would make sense. I don't know how we'd
> specify this type of constraint.
>
> +++ Who can create user namespaces (possibly with restrictions)? +++
>
> I can think of a few formulations.
>
> A simpler approach would be to add a per-namespace setting listing
> users and/or groups that can unshare their userns. A userns starts
> out allowing everyone to unshare userns, and anyone with CAP_SYS_ADMIN
> can change the setting.
>
> A fancier approach would be to have an fd that represents the right to
> unshare your userns. Some privilege broker could give out those fds
> to apps that need them and meet whatever criteria are set. If you try
> to unshare your userns without the fd, it falls back to some simpler
> policy.
>
> I think I prefer the simpler one. It's simple, and I haven't come up
> with a concrete problem with it yet.
>
>
>
>
> Thoughts?
> _______________________________________________
> Containers mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/containers

2016-03-08 10:05:47

by Alexander Larsson

[permalink] [raw]
Subject: Re: Thoughts on tightening up user namespace creation

On mån, 2016-03-07 at 21:15 -0800, Andy Lutomirski wrote:
> Hi all-
>
> I think there are three main types of concerns.  First, there might
> be
> some as-yet-unknown semantic issues that would allow privilege
> escalation by users who create user namespaces and then confuse
> something else in the system.  Second, enabling user namespaces
> exposes a lot of attack surface to unprivileged users.  Third,
> allowing tasks to create user namespaces exposes the kernel to
> various
> resource exhaustion attacks that wouldn't be possible otherwise.

In my work on xdg-app i've seen some issues that I'd ideally would like
to see a solution to. They are not necessarily security
vulnerabilities, but still problems:

devpts is only mountable in a user namespace if the root user is
mapped. Possible to work around, but ugly.

There is no way to recursively apply mount flags. For example, I often
want to recursively bind mount some directory from the host but with
MS_READONLY|MS_NODEV.  I cannot apply the flags in the MS_BIND|MS_REC
mount, so instead i have to first bind mount and then remount. However,
the remount is not recursive, so i have to manually parse
/proc/self/mountinfo and figure out all the submounts that were added.
Also, I have to manually avoid trying to remount covered mounts,
because I can't reach those, and for each remount I have to parse out
its current flags so i don't accidentally unset some set flag, causing
EPERM. 


Mount flags are not applied on propagated mounts. Even if I do all the
stuff above, if i get a *new* mount propagated into my namespace, or if
a parent unmount is propagated uncovering an mount in my namespace,
then this new mountpoint is not read-only. This has no workaround that
I'm currently aware of.

Abstract unix domain sockets are tied to the network namespace. I
understand where this comes from, socket syscalls are "networkish".
However, the non-abstract unix domain sockets are under the control of
the filesystem namespace, and I can fully control them when setting up
the sandbox. But, as long as the sandbox share the network namespace
with the host (which is likely for desktop apps) it will have full
access to all services listening on abstract sockets on the host. This
is particularly problematic because 1) abstract sockets have no file
permissions, so any Xserver running on the host is wide open, 2)
Whether a connect call uses abstract sockets is not detectable via
seccomp, so we can't filter it in any other way. I don't know how sever
this is, as it depends on how trusty the individual services are but at
least on my system "grep @ /proc/net/unix" lists session dbus
instances, X server, and some iSCSI thing.

/proc (even the limited pid namespace one) contains a lot of old cruft
that at a minimum leaks hardware info to the sandbox, and could
potentially do worse (/proc/sysrq-trigger anyone?). I'd like to be able
to mount a "clean" /proc that has only the process-related stuff.

> +++ What does the privilege of creating a user namespace entail? +++
>

> It might be more interesting to allow a task to unshare all
> namespaces, hold all capabilities in them, but to still be unable to
> use certain privileged facilities.  For example, maybe denying
> administrative control over iptables, creation of exotic network
> interface types, or similar would make sense.  

> I don't know how we'd specify this type of constraint.

I think this particular issue is the main problem here. Unless we add
some very course bit-flags that specify the constraints it is going to
be a very complex API to set up such constraints. Adding course bit-
flags essentially means adding new capabilities (maybe subsetting
existing ones). Given how hard it is to understand how all the current
capabilities interact and how they can be exploited I'm not sure this
is a great idea.

Maybe we can use the LSM framework to model the constraints? For
instance, the user could be allowed to create user namespaces, but they
processes in it automatically get some selinux context applied. Then
that selinux context could be configured to limit access to certain
operations.

> +++ Who can create user namespaces (possibly with restrictions)? +++
>
> I can think of a few formulations.
>
> A simpler approach would be to add a per-namespace setting listing
> users and/or groups that can unshare their userns.  A userns starts
> out allowing everyone to unshare userns, and anyone with
> CAP_SYS_ADMIN
> can change the setting.

This sounds like a cgroup controller to me. It makes sense for my
usecase (i.e. sandboxed desktop apps). You want to give all processes
in the users login session access to user namespaces, but not necessary
to e.g. a service or background process or a cron job running as that
user.

> A fancier approach would be to have an fd that represents the right
> to
> unshare your userns.  Some privilege broker could give out those fds
> to apps that need them and meet whatever criteria are set.  If you
> try
> to unshare your userns without the fd, it falls back to some simpler
> policy.

In practice though, how would the privilege broken know and apply the
criteria. Its not even got the information the kernel has (such as
race-free access to the peer cgroup).

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Alexander Larsson Red Hat, Inc
[email protected] [email protected]
He's an ungodly devious paramedic on his last day in the job. She's a
sharp-shooting cigar-chomping archaeologist married to the Mob. They
fight crime!


2016-03-08 16:41:17

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Thoughts on tightening up user namespace creation

Andy Lutomirski <[email protected]> writes:

> Hi all-

[Snip strange things distros do]

Distros do strange things from other peoples perspectives. Sometimes we
can help with that sometimes we can't. In general producing kernel code
that is reliable and well maintained is what we can do. Distro folks
can decide what they are comfortable beyond that.

Frankly I find it heartening that not all distros enable everything all
of the time, are are showing some modicum of restraint and judgement.

If folks don't think a feature like user namespaces is ready and they
don't need that feature I am quite happy for them not to enable that
feature in their kernel.

> Since I doubt we'll ever fully address the attack surface issue at
> least, would it make sense to try to come up with an upstreamable way
> to limit who can create new user namespaces and/or do various
> dangerous things with them?

Even without user namespaces the kernel has attack surface issues. The
kernel is big and bugs happen. That surface is only bigger when you are
root in a user namespace so the probability of a finding an exploitable
bug goes up.

> I'll divide the rest of the email into the "what" and the "who".
>
> +++ What does the privilege of creating a user namespace entail? +++
>
> This could be an all-or-nothing thing. It would certainly be possible
> for appropriately privileged tasks to be able to unshare namespaces
> and use their facilities exactly like any task can in a current
> user-ns-enabled kernel and for other tasks to be unable to unshare
> anything.
>
> Finer gradations are, in principle, possible. For example, it could
> be possible for a given task to unshare its userns but to have limited
> caps inside or to be unable to unshare certain other namespaces. For
> example, maybe a task could unshare userns and mount ns but not net
> ns. I don't think this would be particularly useful.

I am actually inclined to think just the opposite. There was a period
where would have been much less susceptible to problems if just
unprivileged create to the mount namespace could have been implemented.

When I look at this from a resource consumption point of view I
definitely see arguments for limiting things by resource type. As it
can be very easy to know I need no more than X of some specific resource
type but that I don't know how much memory that will take.

> It might be more interesting to allow a task to unshare all
> namespaces, hold all capabilities in them, but to still be unable to
> use certain privileged facilities. For example, maybe denying
> administrative control over iptables, creation of exotic network
> interface types, or similar would make sense. I don't know how we'd
> specify this type of constraint.

That does seem to start approaching lsm territory. And there is a funny
balance between reducing attack surface and adding attack surface to
reduce attack surface.

> +++ Who can create user namespaces (possibly with restrictions)? +++
>
> I can think of a few formulations.
>
> A simpler approach would be to add a per-namespace setting listing
> users and/or groups that can unshare their userns. A userns starts
> out allowing everyone to unshare userns, and anyone with CAP_SYS_ADMIN
> can change the setting.
>
> A fancier approach would be to have an fd that represents the right to
> unshare your userns. Some privilege broker could give out those fds
> to apps that need them and meet whatever criteria are set. If you try
> to unshare your userns without the fd, it falls back to some simpler
> policy.
>
> I think I prefer the simpler one. It's simple, and I haven't come up
> with a concrete problem with it yet.

Agreed. Your simple scheme is roughly what I was proposing earlier of
having a per user limit on the number of user namespaces they can
create.

I am a little partial to having it be a resource limit as that covers
more use cases with less code.

That said the really important case to cover is the case where some
subset of applications are denied access to resources (for sandboxing)
and another subset is allowed.

Eric

2016-03-08 18:32:21

by Andy Lutomirski

[permalink] [raw]
Subject: Re: Thoughts on tightening up user namespace creation

On Mar 7, 2016 10:06 PM, "Serge E. Hallyn" <[email protected]> wrote:
>
> On Mon, Mar 07, 2016 at 09:15:25PM -0800, Andy Lutomirski wrote:
> > - Ubuntu requires CAP_SYS_ADMIN
>
> No, it does not. It has temporarily re-added a sysctl which can enable
> that behavior, but it's not set by default. The reason for providing it
> is not a distrust of user namespaces in general, but because we're enabling
> some bleeding edge patches which haven't been accepted upstream yet. Once
> they're accepted upstream I expect that patch to be dropped again, unless
> it has gone upstream.
>
> Debian does afaik still have a version of a patch I'd originally written
> before user namespaces were upstream which defaulted unprivileged userns
> cloning to off. Did you mean Debian here?

I meant Ubuntu 14.04, which I tested, possibly poorly.

2016-03-08 22:41:26

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: Thoughts on tightening up user namespace creation

Quoting Andy Lutomirski ([email protected]):
> On Mar 7, 2016 10:06 PM, "Serge E. Hallyn" <[email protected]> wrote:
> >
> > On Mon, Mar 07, 2016 at 09:15:25PM -0800, Andy Lutomirski wrote:
> > > - Ubuntu requires CAP_SYS_ADMIN
> >
> > No, it does not. It has temporarily re-added a sysctl which can enable
> > that behavior, but it's not set by default. The reason for providing it
> > is not a distrust of user namespaces in general, but because we're enabling
> > some bleeding edge patches which haven't been accepted upstream yet. Once
> > they're accepted upstream I expect that patch to be dropped again, unless
> > it has gone upstream.
> >
> > Debian does afaik still have a version of a patch I'd originally written
> > before user namespaces were upstream which defaulted unprivileged userns
> > cloning to off. Did you mean Debian here?
>
> I meant Ubuntu 14.04, which I tested, possibly poorly.

Weird, 14.04 with the default kernel (3.13.0-79-generic #123-Ubuntu)
doesn't have the sysctl at all.

-serge

2016-03-09 18:14:48

by Kees Cook

[permalink] [raw]
Subject: Re: Thoughts on tightening up user namespace creation

On Mon, Mar 7, 2016 at 9:15 PM, Andy Lutomirski <[email protected]> wrote:
> Hi all-
>
> There are several users and distros that are nervous about user
> namespaces from an attack surface point of view.
>
> - RHEL and Arch have userns disabled.
>
> - Ubuntu requires CAP_SYS_ADMIN
>
> - Kees periodically proposes to upstream some sysctl to control
> userns creation.

And here's another ring0 escalation flaw, made available to
unprivileged users because of userns:

https://code.google.com/p/google-security-research/issues/detail?id=758

> I think there are three main types of concerns. First, there might be
> some as-yet-unknown semantic issues that would allow privilege
> escalation by users who create user namespaces and then confuse
> something else in the system. Second, enabling user namespaces
> exposes a lot of attack surface to unprivileged users. Third,
> allowing tasks to create user namespaces exposes the kernel to various
> resource exhaustion attacks that wouldn't be possible otherwise.
>
> Since I doubt we'll ever fully address the attack surface issue at
> least, would it make sense to try to come up with an upstreamable way
> to limit who can create new user namespaces and/or do various
> dangerous things with them?

The change in attack surface is _substantial_. We must have a way to
globally disable userns.

-Kees

--
Kees Cook
Chrome OS & Brillo Security

2016-03-09 18:51:19

by Colin Walters

[permalink] [raw]
Subject: Re: Thoughts on tightening up user namespace creation

On Wed, Mar 9, 2016, at 01:14 PM, Kees Cook wrote:
> On Mon, Mar 7, 2016 at 9:15 PM, Andy Lutomirski <[email protected]> wrote:
> > Hi all-
> >
> > There are several users and distros that are nervous about user
> > namespaces from an attack surface point of view.
> >
> > - RHEL and Arch have userns disabled.
> >
> > - Ubuntu requires CAP_SYS_ADMIN
> >
> > - Kees periodically proposes to upstream some sysctl to control
> > userns creation.
>
> And here's another ring0 escalation flaw, made available to
> unprivileged users because of userns:
>
> https://code.google.com/p/google-security-research/issues/detail?id=758

Looks like Andy won't have to eat his hat ;)

> The change in attack surface is _substantial_. We must have a way to
> globally disable userns.

No one would object if it was enabled but only accessible to
CAP_SYS_ADMIN though, right? This could be useful for
writing setuid binaries that expose some of the features, but e.g. not
CAP_NET_ADMIN.

Andy's suggestion of having this be a per-namespace setting makes
sense to me. Currently some container tools that do use userns
are by default denying it to be recursive (Sandstorm.io and Docker 1.10 at least)
by using a seccomp filter on clone(). If we had this setting that
filter wouldn't be necessary, and would solve the issue that seccomp filters
aren't robust against the kernel adding new API, e.g. a new CLONE_NEWUSER_NONEWPRIVS
which might enable chroot() but not CAP_NET_ADMIN.

2016-03-09 19:05:37

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: Thoughts on tightening up user namespace creation

On 2016-03-09 13:51, Colin Walters wrote:
> On Wed, Mar 9, 2016, at 01:14 PM, Kees Cook wrote:
>> On Mon, Mar 7, 2016 at 9:15 PM, Andy Lutomirski <[email protected]> wrote:
>>> Hi all-
>>>
>>> There are several users and distros that are nervous about user
>>> namespaces from an attack surface point of view.
>>>
>>> - RHEL and Arch have userns disabled.
>>>
>>> - Ubuntu requires CAP_SYS_ADMIN
>>>
>>> - Kees periodically proposes to upstream some sysctl to control
>>> userns creation.
>>
>> And here's another ring0 escalation flaw, made available to
>> unprivileged users because of userns:
>>
>> https://code.google.com/p/google-security-research/issues/detail?id=758
>
> Looks like Andy won't have to eat his hat ;)
>
>> The change in attack surface is _substantial_. We must have a way to
>> globally disable userns.
>
> No one would object if it was enabled but only accessible to
> CAP_SYS_ADMIN though, right? This could be useful for
> writing setuid binaries that expose some of the features, but e.g. not
> CAP_NET_ADMIN.
At least Google Chrome (and probably Chromium) is using user namespaces
without CAP_SYS_ADMIM (although AFAIUI, it's because they can't use the
other namespace types effectively as a regular user).
>
> Andy's suggestion of having this be a per-namespace setting makes
> sense to me. Currently some container tools that do use userns
> are by default denying it to be recursive (Sandstorm.io and Docker 1.10 at least)
> by using a seccomp filter on clone(). If we had this setting that
> filter wouldn't be necessary, and would solve the issue that seccomp filters
> aren't robust against the kernel adding new API, e.g. a new CLONE_NEWUSER_NONEWPRIVS
> which might enable chroot() but not CAP_NET_ADMIN.
>
Personally, I like the suggestion from Alexander Larsson to make a
cgroup controller. Container tools obviously want some degree of
hierarchical control (even if it's just saying that the hierarchy ends
here), and it would simplify the possibility of running more than one
container stack on the same host (I know at least a couple people who
would love to be able to safely use Docker on the same host as LXC or
lmctfy).

2016-03-09 19:07:35

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: Thoughts on tightening up user namespace creation

Quoting Kees Cook ([email protected]):
> On Mon, Mar 7, 2016 at 9:15 PM, Andy Lutomirski <[email protected]> wrote:
> > Hi all-
> >
> > There are several users and distros that are nervous about user
> > namespaces from an attack surface point of view.
> >
> > - RHEL and Arch have userns disabled.
> >
> > - Ubuntu requires CAP_SYS_ADMIN
> >
> > - Kees periodically proposes to upstream some sysctl to control
> > userns creation.
>
> And here's another ring0 escalation flaw, made available to
> unprivileged users because of userns:
>
> https://code.google.com/p/google-security-research/issues/detail?id=758

Kees, I think you think this makes your point, but all it does is make
me want to argue with you and start flinging back cves against kvm,
af_unix, sctp, etc.

> > I think there are three main types of concerns. First, there might be
> > some as-yet-unknown semantic issues that would allow privilege
> > escalation by users who create user namespaces and then confuse
> > something else in the system. Second, enabling user namespaces
> > exposes a lot of attack surface to unprivileged users. Third,
> > allowing tasks to create user namespaces exposes the kernel to various
> > resource exhaustion attacks that wouldn't be possible otherwise.
> >
> > Since I doubt we'll ever fully address the attack surface issue at
> > least, would it make sense to try to come up with an upstreamable way
> > to limit who can create new user namespaces and/or do various
> > dangerous things with them?
>
> The change in attack surface is _substantial_. We must have a way to
> globally disable userns.

I'm confused. Didn't we agree a few months ago, somewhat reluctantly,
on a sysctl?

2016-03-09 19:12:26

by Kees Cook

[permalink] [raw]
Subject: Re: Thoughts on tightening up user namespace creation

On Wed, Mar 9, 2016 at 11:07 AM, Serge E. Hallyn <[email protected]> wrote:
> Quoting Kees Cook ([email protected]):
>> On Mon, Mar 7, 2016 at 9:15 PM, Andy Lutomirski <[email protected]> wrote:
>> > Hi all-
>> >
>> > There are several users and distros that are nervous about user
>> > namespaces from an attack surface point of view.
>> >
>> > - RHEL and Arch have userns disabled.
>> >
>> > - Ubuntu requires CAP_SYS_ADMIN
>> >
>> > - Kees periodically proposes to upstream some sysctl to control
>> > userns creation.
>>
>> And here's another ring0 escalation flaw, made available to
>> unprivileged users because of userns:
>>
>> https://code.google.com/p/google-security-research/issues/detail?id=758
>
> Kees, I think you think this makes your point, but all it does is make
> me want to argue with you and start flinging back cves against kvm,
> af_unix, sctp, etc.

I can run a distro kernel without kvm and sctp, because I can leave
their modules unloaded. There is no such option for userns.

The last af_unix CVEs I see were 2 from 2013, and before that, 2010.
There's no comparison here on frequency.

>> > I think there are three main types of concerns. First, there might be
>> > some as-yet-unknown semantic issues that would allow privilege
>> > escalation by users who create user namespaces and then confuse
>> > something else in the system. Second, enabling user namespaces
>> > exposes a lot of attack surface to unprivileged users. Third,
>> > allowing tasks to create user namespaces exposes the kernel to various
>> > resource exhaustion attacks that wouldn't be possible otherwise.
>> >
>> > Since I doubt we'll ever fully address the attack surface issue at
>> > least, would it make sense to try to come up with an upstreamable way
>> > to limit who can create new user namespaces and/or do various
>> > dangerous things with them?
>>
>> The change in attack surface is _substantial_. We must have a way to
>> globally disable userns.
>
> I'm confused. Didn't we agree a few months ago, somewhat reluctantly,
> on a sysctl?

No, Eric refused it and wanted finer-grained controls.

-Kees

--
Kees Cook
Chrome OS & Brillo Security

2016-03-09 19:21:19

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: Thoughts on tightening up user namespace creation

Quoting Colin Walters ([email protected]):
> On Wed, Mar 9, 2016, at 01:14 PM, Kees Cook wrote:
> > On Mon, Mar 7, 2016 at 9:15 PM, Andy Lutomirski <[email protected]> wrote:
> > > Hi all-
> > >
> > > There are several users and distros that are nervous about user
> > > namespaces from an attack surface point of view.
> > >
> > > - RHEL and Arch have userns disabled.
> > >
> > > - Ubuntu requires CAP_SYS_ADMIN
> > >
> > > - Kees periodically proposes to upstream some sysctl to control
> > > userns creation.
> >
> > And here's another ring0 escalation flaw, made available to
> > unprivileged users because of userns:
> >
> > https://code.google.com/p/google-security-research/issues/detail?id=758
>
> Looks like Andy won't have to eat his hat ;)
>
> > The change in attack surface is _substantial_. We must have a way to
> > globally disable userns.
>
> No one would object if it was enabled but only accessible to
> CAP_SYS_ADMIN though, right? This could be useful for

I think that would be terrible. I'd have to expose all of CAP_SYS_ADMIN
to allow use of CLONE_NEWUSER. I'd be more interested in a new CAP_NEWUSER
capability. Then systems wanting to support unprivileged users doing user
namespaces could set a pam module giving certain users that cap in pI, and
set it on fI on their container managers. Userspace has to give access to
mapped uids through /etc/subuid too, so it's not *so* huge added hurdle.
Well that's not quite true - with empty subuid, users can create a userns
with no mapped userids which in itself is useful for sandboxing.

The biggest problem with a CAP_NEWUSER would be that it's more inherently
permanent than a new sysctl. The increase in attack surface is real, but
over time I'd like to think that we will have dealt with it and should be
able to make CLONE_NEWUSER unprivileged. Because what we have is an
implementation issue (not in user namespaces), not a design issue.

And I do agree the issue is real.

-serge

2016-03-09 19:25:52

by Kees Cook

[permalink] [raw]
Subject: Re: Thoughts on tightening up user namespace creation

On Wed, Mar 9, 2016 at 11:21 AM, Serge E. Hallyn <[email protected]> wrote:
> Quoting Colin Walters ([email protected]):
>> On Wed, Mar 9, 2016, at 01:14 PM, Kees Cook wrote:
>> > On Mon, Mar 7, 2016 at 9:15 PM, Andy Lutomirski <[email protected]> wrote:
>> > > Hi all-
>> > >
>> > > There are several users and distros that are nervous about user
>> > > namespaces from an attack surface point of view.
>> > >
>> > > - RHEL and Arch have userns disabled.
>> > >
>> > > - Ubuntu requires CAP_SYS_ADMIN
>> > >
>> > > - Kees periodically proposes to upstream some sysctl to control
>> > > userns creation.
>> >
>> > And here's another ring0 escalation flaw, made available to
>> > unprivileged users because of userns:
>> >
>> > https://code.google.com/p/google-security-research/issues/detail?id=758
>>
>> Looks like Andy won't have to eat his hat ;)
>>
>> > The change in attack surface is _substantial_. We must have a way to
>> > globally disable userns.
>>
>> No one would object if it was enabled but only accessible to
>> CAP_SYS_ADMIN though, right? This could be useful for
>
> I think that would be terrible. I'd have to expose all of CAP_SYS_ADMIN
> to allow use of CLONE_NEWUSER. I'd be more interested in a new CAP_NEWUSER
> capability. Then systems wanting to support unprivileged users doing user
> namespaces could set a pam module giving certain users that cap in pI, and
> set it on fI on their container managers. Userspace has to give access to
> mapped uids through /etc/subuid too, so it's not *so* huge added hurdle.
> Well that's not quite true - with empty subuid, users can create a userns
> with no mapped userids which in itself is useful for sandboxing.
>
> The biggest problem with a CAP_NEWUSER would be that it's more inherently
> permanent than a new sysctl. The increase in attack surface is real, but
> over time I'd like to think that we will have dealt with it and should be
> able to make CLONE_NEWUSER unprivileged. Because what we have is an
> implementation issue (not in user namespaces), not a design issue.

Andy suggested a capability back in October. But I agree with you, we
don't want a new capability. https://lkml.org/lkml/2015/10/17/94

> And I do agree the issue is real.

And I fully expect for the issue to improve over time: it's not that I
don't want userns, I just want to have the _option_ to disable it at
runtime for the systems that don't need it until the newly exposed
interfaces look like they've had the bulk of their issues resolved.

-Kees

--
Kees Cook
Chrome OS & Brillo Security