2012-10-22 13:45:54

by Aristeu Rozanski

[permalink] [raw]
Subject: [PATCH 0/4] Rebase device_cgroup v2 patchset

This patchset rebases the v2 of the patchset since the v1 was pushed into -rc1
instead. The last patch, not present on previous patchset, fixes the
permission check when allowing everything in a cgroup.

device_cgroup.c | 87 +++++++++++++++++++++++++++++++++++++++-----------------
1 file changed, 61 insertions(+), 26 deletions(-)

Cc: Dave Jones <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: James Morris <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: Serge Hallyn <[email protected]>
Cc: Jiri Slaby <[email protected]>
Signed-off-by: Aristeu Rozanski <[email protected]>

--
Aristeu


2012-10-22 19:58:41

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/4] Rebase device_cgroup v2 patchset

On Mon, 22 Oct 2012 09:45:36 -0400
Aristeu Rozanski <[email protected]> wrote:

> This patchset rebases the v2 of the patchset since the v1 was pushed into -rc1
> instead. The last patch, not present on previous patchset, fixes the
> permission check when allowing everything in a cgroup.
>

I grabbed all four, thanks. Shall send them on to Linus in a week or
so, after a bit more review and some linux-next testing.

I've been trying to work out why I merged the v1 patchset rather than
v2 and cannot find that v2 patchset anywhere. When was it sent out?

2012-10-22 20:14:38

by Aristeu Rozanski

[permalink] [raw]
Subject: Re: [PATCH 0/4] Rebase device_cgroup v2 patchset

Hi Andrew,
On Mon, Oct 22, 2012 at 12:58:38PM -0700, Andrew Morton wrote:
> On Mon, 22 Oct 2012 09:45:36 -0400
> Aristeu Rozanski <[email protected]> wrote:
>
> > This patchset rebases the v2 of the patchset since the v1 was pushed into -rc1
> > instead. The last patch, not present on previous patchset, fixes the
> > permission check when allowing everything in a cgroup.
> >
>
> I grabbed all four, thanks. Shall send them on to Linus in a week or
> so, after a bit more review and some linux-next testing.

thanks, much appreciated

> I've been trying to work out why I merged the v1 patchset rather than
> v2 and cannot find that v2 patchset anywhere. When was it sent out?

It was sent to linux-kernel and cgroups@vger on Sep 04th, here's the
msgid of patch 0 for reference:
<[email protected]>

--
Aristeu

2013-05-14 15:05:48

by Serge Hallyn

[permalink] [raw]
Subject: Re: [PATCH 0/4] Rebase device_cgroup v2 patchset

Hi,

so now that the device cgroup properly respects hierarchy, not allowing
a cgroup to be given greater permission than its parent, should we consider
relaxing the capability checks?

There are two capable(CAP_SYS_ADMIN) checks in deice_cgroup.c: one in
devcgroup_can_attach() to protect changing another task's cgroup, and
one in devcgroup_update_access() to protect writes to the devices.allow
and devices.deny files.

I think the first should be changed to a check for ns_capable() to
the victim's user_ns. Something like

--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -70,10 +70,16 @@ static int devcgroup_can_attach(struct cgroup *new_cgrp,
struct cgroup_taskset *set)
{
struct task_struct *task = cgroup_taskset_first(set);
+ struct user_namespace *ns;
+ int ret = -EPERM;

- if (current != task && !capable(CAP_SYS_ADMIN))
- return -EPERM;
- return 0;
+ if (current == task)
+ return 0;
+
+ ns = userns_get(task);;
+ ret = ns_capable(ns, CAP_SYS_ADMIN) ? 0 : -EPERM;
+ put_user_ns(ns);
+ return ret;
}

/*

For the second, the hierarchy support should let us ignore concerns
about unprivileged users escalating privilege, but I'm trying to decide
whether we need to worry about the sendmail capability class of bugs.
My sense is actually the answer is no, and we can drop the capable()
check altogether. The reason is that while userspace frequently doesn't
properly handle a failing system call due to unexpected lack of partial
privilege, I wouldn't expect any setuid root program to ignore failure
to open or mknod a device file (and proceed into a bad failure mode).
Does this sound rasonable, or a recipe for disaster?

-serge

2013-05-14 15:51:15

by Aristeu Rozanski

[permalink] [raw]
Subject: Re: [PATCH 0/4] Rebase device_cgroup v2 patchset

On Tue, May 14, 2013 at 10:05:39AM -0500, Serge Hallyn wrote:
> so now that the device cgroup properly respects hierarchy, not allowing
> a cgroup to be given greater permission than its parent, should we consider
> relaxing the capability checks?
>
> There are two capable(CAP_SYS_ADMIN) checks in deice_cgroup.c: one in
> devcgroup_can_attach() to protect changing another task's cgroup, and
> one in devcgroup_update_access() to protect writes to the devices.allow
> and devices.deny files.
>
> I think the first should be changed to a check for ns_capable() to
> the victim's user_ns. Something like
>
> --- a/security/device_cgroup.c
> +++ b/security/device_cgroup.c
> @@ -70,10 +70,16 @@ static int devcgroup_can_attach(struct cgroup *new_cgrp,
> struct cgroup_taskset *set)
> {
> struct task_struct *task = cgroup_taskset_first(set);
> + struct user_namespace *ns;
> + int ret = -EPERM;
>
> - if (current != task && !capable(CAP_SYS_ADMIN))
> - return -EPERM;
> - return 0;
> + if (current == task)
> + return 0;
> +
> + ns = userns_get(task);;
> + ret = ns_capable(ns, CAP_SYS_ADMIN) ? 0 : -EPERM;
> + put_user_ns(ns);
> + return ret;
> }

wouldn't this allow a userns root to move a task in the same userns into
a parent cgroup? I believe than anything but moving down the hierarchy
would be very complicated to verify (how far up can you go).

> For the second, the hierarchy support should let us ignore concerns
> about unprivileged users escalating privilege, but I'm trying to decide
> whether we need to worry about the sendmail capability class of bugs.

You have a pointer for more information on those?

> My sense is actually the answer is no, and we can drop the capable()
> check altogether. The reason is that while userspace frequently doesn't
> properly handle a failing system call due to unexpected lack of partial
> privilege, I wouldn't expect any setuid root program to ignore failure
> to open or mknod a device file (and proceed into a bad failure mode).
> Does this sound rasonable, or a recipe for disaster?

The second case sounds ok to me

--
Aristeu

2013-05-14 16:22:45

by Serge Hallyn

[permalink] [raw]
Subject: Re: [PATCH 0/4] Rebase device_cgroup v2 patchset

Quoting Aristeu Rozanski ([email protected]):
> On Tue, May 14, 2013 at 10:05:39AM -0500, Serge Hallyn wrote:
> > so now that the device cgroup properly respects hierarchy, not allowing
> > a cgroup to be given greater permission than its parent, should we consider
> > relaxing the capability checks?
> >
> > There are two capable(CAP_SYS_ADMIN) checks in deice_cgroup.c: one in
> > devcgroup_can_attach() to protect changing another task's cgroup, and
> > one in devcgroup_update_access() to protect writes to the devices.allow
> > and devices.deny files.
> >
> > I think the first should be changed to a check for ns_capable() to
> > the victim's user_ns. Something like
> >
> > --- a/security/device_cgroup.c
> > +++ b/security/device_cgroup.c
> > @@ -70,10 +70,16 @@ static int devcgroup_can_attach(struct cgroup *new_cgrp,
> > struct cgroup_taskset *set)
> > {
> > struct task_struct *task = cgroup_taskset_first(set);
> > + struct user_namespace *ns;
> > + int ret = -EPERM;
> >
> > - if (current != task && !capable(CAP_SYS_ADMIN))
> > - return -EPERM;
> > - return 0;
> > + if (current == task)
> > + return 0;
> > +
> > + ns = userns_get(task);;
> > + ret = ns_capable(ns, CAP_SYS_ADMIN) ? 0 : -EPERM;
> > + put_user_ns(ns);
> > + return ret;
> > }
>
> wouldn't this allow a userns root to move a task in the same userns into
> a parent cgroup? I believe than anything but moving down the hierarchy
> would be very complicated to verify (how far up can you go).

But only if they are able to open the tasks file for writing, which
they shouldn't be able to do, right?

> > For the second, the hierarchy support should let us ignore concerns
> > about unprivileged users escalating privilege, but I'm trying to decide
> > whether we need to worry about the sendmail capability class of bugs.
>
> You have a pointer for more information on those?

Darn - unfortunately the best description of it, which was at
http://userweb.kernel.org/~morgan/sendmail-capabilities-war-story.html
is no longer there since userweb was taken down, and it was never
captured by archive.org. There's a brief description in
http://lwn.net/Articles/280279/ at the paragraph starting with "The
memory of the sendmail-capabilities bug from 2000..."

> > My sense is actually the answer is no, and we can drop the capable()
> > check altogether. The reason is that while userspace frequently doesn't
> > properly handle a failing system call due to unexpected lack of partial
> > privilege, I wouldn't expect any setuid root program to ignore failure
> > to open or mknod a device file (and proceed into a bad failure mode).
> > Does this sound rasonable, or a recipe for disaster?
>
> The second case sounds ok to me

-serge

2013-05-14 21:02:17

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/4] Rebase device_cgroup v2 patchset

Serge Hallyn <[email protected]> writes:

> Quoting Aristeu Rozanski ([email protected]):
>> On Tue, May 14, 2013 at 10:05:39AM -0500, Serge Hallyn wrote:
>> > so now that the device cgroup properly respects hierarchy, not allowing
>> > a cgroup to be given greater permission than its parent, should we consider
>> > relaxing the capability checks?
>> >
>> > There are two capable(CAP_SYS_ADMIN) checks in deice_cgroup.c: one in
>> > devcgroup_can_attach() to protect changing another task's cgroup, and
>> > one in devcgroup_update_access() to protect writes to the devices.allow
>> > and devices.deny files.
>> >
>> > I think the first should be changed to a check for ns_capable() to
>> > the victim's user_ns. Something like
>> >
>> > --- a/security/device_cgroup.c
>> > +++ b/security/device_cgroup.c
>> > @@ -70,10 +70,16 @@ static int devcgroup_can_attach(struct cgroup *new_cgrp,
>> > struct cgroup_taskset *set)
>> > {
>> > struct task_struct *task = cgroup_taskset_first(set);
>> > + struct user_namespace *ns;
>> > + int ret = -EPERM;
>> >
>> > - if (current != task && !capable(CAP_SYS_ADMIN))
>> > - return -EPERM;
>> > - return 0;
>> > + if (current == task)
>> > + return 0;
>> > +
>> > + ns = userns_get(task);;
>> > + ret = ns_capable(ns, CAP_SYS_ADMIN) ? 0 : -EPERM;
>> > + put_user_ns(ns);
>> > + return ret;
>> > }
>>
>> wouldn't this allow a userns root to move a task in the same userns into
>> a parent cgroup? I believe than anything but moving down the hierarchy
>> would be very complicated to verify (how far up can you go).
>
> But only if they are able to open the tasks file for writing, which
> they shouldn't be able to do, right?

That should be looked at very closely. There are some funny exploits of
setuid root applications writing to files that have required some
additional permission checks on /proc/<pid>/uid_map. I think the
cgroups files may be vulnerable to some of the same kind of exploits.

Certainly we should be verifying that the opener of the file had the
capabilities we are trying to use to avoid being open to those kinds of
problems.

I am trying to see the utilitity of the proposed patch. It doesn't
allow mknod. So what is the benefit of having the user namespace bits?

Is the point to allow the userns root to remove access to selected
devices from it's children even if the DAC permissions would allow the
access?

>> > For the second, the hierarchy support should let us ignore concerns
>> > about unprivileged users escalating privilege, but I'm trying to decide
>> > whether we need to worry about the sendmail capability class of bugs.
>>
>> You have a pointer for more information on those?
>
> Darn - unfortunately the best description of it, which was at
> http://userweb.kernel.org/~morgan/sendmail-capabilities-war-story.html
> is no longer there since userweb was taken down, and it was never
> captured by archive.org. There's a brief description in
> http://lwn.net/Articles/280279/ at the paragraph starting with "The
> memory of the sendmail-capabilities bug from 2000..."

I think the sendmail website has something as well. The core problem
was programs not dealing safely with having some of their privileges
removed. As I recall an unprivileged attacker at one point could drop
the CAP_SYS_SETUID on sendmail and get it to deliver mail as root.

I took a good hard look at this issue before implementing limits on
setuid in the user namespace and it appears that the authors of setuid
application learned from this as well and I could not find an single
program that would call setuid without out checking it's return code.

That said I haven't looked at open or mknod, and usually we are talking
about calls that aren't made by suid apps so I think there is a fair
chance that dropping some of those permissions could cause issues.

The first danger that crosses my mind is what happens if you remove
access to /dev/tty from a normal application that would trying and log
strange goings on to a user if they could.

Shrug mostly I don't see the advantage of this change.

Eric


>> > My sense is actually the answer is no, and we can drop the capable()
>> > check altogether. The reason is that while userspace frequently doesn't
>> > properly handle a failing system call due to unexpected lack of partial
>> > privilege, I wouldn't expect any setuid root program to ignore failure
>> > to open or mknod a device file (and proceed into a bad failure mode).
>> > Does this sound rasonable, or a recipe for disaster?
>>
>> The second case sounds ok to me
>
> -serge

2013-05-16 01:13:01

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 0/4] Rebase device_cgroup v2 patchset

Quoting Eric W. Biederman ([email protected]):
> Serge Hallyn <[email protected]> writes:
>
> > Quoting Aristeu Rozanski ([email protected]):
> >> On Tue, May 14, 2013 at 10:05:39AM -0500, Serge Hallyn wrote:
> >> > so now that the device cgroup properly respects hierarchy, not allowing
> >> > a cgroup to be given greater permission than its parent, should we consider
> >> > relaxing the capability checks?
> >> >
> >> > There are two capable(CAP_SYS_ADMIN) checks in deice_cgroup.c: one in
> >> > devcgroup_can_attach() to protect changing another task's cgroup, and
> >> > one in devcgroup_update_access() to protect writes to the devices.allow
> >> > and devices.deny files.
> >> >
> >> > I think the first should be changed to a check for ns_capable() to
> >> > the victim's user_ns. Something like
> >> >
> >> > --- a/security/device_cgroup.c
> >> > +++ b/security/device_cgroup.c
> >> > @@ -70,10 +70,16 @@ static int devcgroup_can_attach(struct cgroup *new_cgrp,
> >> > struct cgroup_taskset *set)
> >> > {
> >> > struct task_struct *task = cgroup_taskset_first(set);
> >> > + struct user_namespace *ns;
> >> > + int ret = -EPERM;
> >> >
> >> > - if (current != task && !capable(CAP_SYS_ADMIN))
> >> > - return -EPERM;
> >> > - return 0;
> >> > + if (current == task)
> >> > + return 0;
> >> > +
> >> > + ns = userns_get(task);;
> >> > + ret = ns_capable(ns, CAP_SYS_ADMIN) ? 0 : -EPERM;
> >> > + put_user_ns(ns);
> >> > + return ret;
> >> > }
> >>
> >> wouldn't this allow a userns root to move a task in the same userns into
> >> a parent cgroup? I believe than anything but moving down the hierarchy
> >> would be very complicated to verify (how far up can you go).
> >
> > But only if they are able to open the tasks file for writing, which
> > they shouldn't be able to do, right?
>
> That should be looked at very closely. There are some funny exploits of
> setuid root applications writing to files that have required some
> additional permission checks on /proc/<pid>/uid_map. I think the
> cgroups files may be vulnerable to some of the same kind of exploits.
>
> Certainly we should be verifying that the opener of the file had the
> capabilities we are trying to use to avoid being open to those kinds of
> problems.
>
> I am trying to see the utilitity of the proposed patch. It doesn't
> allow mknod. So what is the benefit of having the user namespace bits?

I'm still thinking through it, which is why I haven't sent a real
patch. What I'm working on is the unprivileged startup of a container.
Right now most things are not allowed in a private user ns, so device
cgroup is not as useful. But it should be possible eventually to use
block devices, which the original unprivileged user owned, by chowning
the blockdev to a user mapped into the target userns.

The unprivileged user may want to use devices cgroup so he can chown
the loop file into the container, but only allow read-only mounts, for
instance.

> Is the point to allow the userns root to remove access to selected
> devices from it's children even if the DAC permissions would allow the
> access?

Yes I think that's it - except userns root before forking the container
init (and venturing into the really untrusted category).

...

> That said I haven't looked at open or mknod, and usually we are talking
> about calls that aren't made by suid apps so I think there is a fair
> chance that dropping some of those permissions could cause issues.
> The first danger that crosses my mind is what happens if you remove
> access to /dev/tty from a normal application that would trying and log
> strange goings on to a user if they could.

If they were going to do that over tty, that would be to the malicious
user anyway, so that should just either be ignored, or result in the
program exiting early.

> Shrug mostly I don't see the advantage of this change.

It's also possible that this will end up being worked around by the new
(not-yet-designed) interface/library which Tejun wants people to use,
sitting above the cgroupfs. At least at a first layer.

Anyway this isn't urgent, as it's not in the way for general unprivileged
container creation. But in general if we don't need the check to be
capable(), it would be better to introduce the right check.

-serge

2013-05-16 01:22:08

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 0/4] Rebase device_cgroup v2 patchset

Quoting Serge E. Hallyn ([email protected]):
> Quoting Eric W. Biederman ([email protected]):
> > Serge Hallyn <[email protected]> writes:
> >
> > > Quoting Aristeu Rozanski ([email protected]):
> > >> On Tue, May 14, 2013 at 10:05:39AM -0500, Serge Hallyn wrote:
> > >> > so now that the device cgroup properly respects hierarchy, not allowing
> > >> > a cgroup to be given greater permission than its parent, should we consider
> > >> > relaxing the capability checks?
> > >> >
> > >> > There are two capable(CAP_SYS_ADMIN) checks in deice_cgroup.c: one in
> > >> > devcgroup_can_attach() to protect changing another task's cgroup, and
> > >> > one in devcgroup_update_access() to protect writes to the devices.allow
> > >> > and devices.deny files.
> > >> >
> > >> > I think the first should be changed to a check for ns_capable() to
> > >> > the victim's user_ns. Something like
> > >> >
> > >> > --- a/security/device_cgroup.c
> > >> > +++ b/security/device_cgroup.c
> > >> > @@ -70,10 +70,16 @@ static int devcgroup_can_attach(struct cgroup *new_cgrp,
> > >> > struct cgroup_taskset *set)
> > >> > {
> > >> > struct task_struct *task = cgroup_taskset_first(set);
> > >> > + struct user_namespace *ns;
> > >> > + int ret = -EPERM;
> > >> >
> > >> > - if (current != task && !capable(CAP_SYS_ADMIN))
> > >> > - return -EPERM;
> > >> > - return 0;
> > >> > + if (current == task)
> > >> > + return 0;
> > >> > +
> > >> > + ns = userns_get(task);;
> > >> > + ret = ns_capable(ns, CAP_SYS_ADMIN) ? 0 : -EPERM;
> > >> > + put_user_ns(ns);
> > >> > + return ret;
> > >> > }
> > >>
> > >> wouldn't this allow a userns root to move a task in the same userns into
> > >> a parent cgroup? I believe than anything but moving down the hierarchy
> > >> would be very complicated to verify (how far up can you go).
> > >
> > > But only if they are able to open the tasks file for writing, which
> > > they shouldn't be able to do, right?
> >
> > That should be looked at very closely. There are some funny exploits of
> > setuid root applications writing to files that have required some
> > additional permission checks on /proc/<pid>/uid_map. I think the
> > cgroups files may be vulnerable to some of the same kind of exploits.
> >
> > Certainly we should be verifying that the opener of the file had the
> > capabilities we are trying to use to avoid being open to those kinds of
> > problems.
> >
> > I am trying to see the utilitity of the proposed patch. It doesn't
> > allow mknod. So what is the benefit of having the user namespace bits?
>
> I'm still thinking through it, which is why I haven't sent a real
> patch. What I'm working on is the unprivileged startup of a container.
> Right now most things are not allowed in a private user ns, so device
> cgroup is not as useful. But it should be possible eventually to use
> block devices, which the original unprivileged user owned, by chowning
> the blockdev to a user mapped into the target userns.
>
> The unprivileged user may want to use devices cgroup so he can chown
> the loop file into the container, but only allow read-only mounts, for
> instance.
>
> > Is the point to allow the userns root to remove access to selected
> > devices from it's children even if the DAC permissions would allow the
> > access?
>
> Yes I think that's it - except userns root before forking the container
> init (and venturing into the really untrusted category).
>
> ...
>
> > That said I haven't looked at open or mknod, and usually we are talking
> > about calls that aren't made by suid apps so I think there is a fair
> > chance that dropping some of those permissions could cause issues.
> > The first danger that crosses my mind is what happens if you remove
> > access to /dev/tty from a normal application that would trying and log
> > strange goings on to a user if they could.
>
> If they were going to do that over tty, that would be to the malicious
> user anyway, so that should just either be ignored, or result in the
> program exiting early.
>
> > Shrug mostly I don't see the advantage of this change.
>
> It's also possible that this will end up being worked around by the new
> (not-yet-designed) interface/library which Tejun wants people to use,
> sitting above the cgroupfs. At least at a first layer.
>
> Anyway this isn't urgent, as it's not in the way for general unprivileged
> container creation. But in general if we don't need the check to be
> capable(), it would be better to introduce the right check.
>
> -serge

I'm terribly sorry, Andrew, I have no idea how that address for you got
into my address book. (Corrected) fwiw the thread can be followed at
https://lkml.org/lkml/2013/5/14/363 .

-serge