2016-11-08 23:28:56

by John Stultz

[permalink] [raw]
Subject: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

This patch adds logic to allows a process to migrate other tasks
between cgroups if they have CAP_SYS_RESOURCE.

In Android (where this feature originated), the ActivityManager tracks
various application states (TOP_APP, FOREGROUND, BACKGROUND, SYSTEM,
etc), and then as applications change states, the SchedPolicy logic
will migrate the application tasks between different cgroups used
to control the different application states (for example, there is a
background cpuset cgroup which can limit background tasks to stay
on one low-power cpu, and the bg_non_interactive cpuctrl cgroup can
then further limit those background tasks to a small percentage of
that one cpu's cpu time).

However, for security reasons, Android doesn't want to make the
system_server (the process that runs the ActivityManager and
SchedPolicy logic), run as root. So in the Android common.git
kernel, they have some logic to allow cgroups to loosen their
permissions so CAP_SYS_NICE tasks can migrate other tasks between
cgroups.

I feel the approach taken there overloads CAP_SYS_NICE a bit much
for non-android environments.

So this patch, as suggested by Michael Kerrisk, simply adds a
check for CAP_SYS_RESOURCE.

I've tested this with AOSP master, and this seems to work well
as Zygote and system_server already use CAP_SYS_RESOURCE. I've
also submitted patches against the android-4.4 kernel to change
it to use CAP_SYS_RESOURCE, and the Android developers just merged
it.

Cc: Tejun Heo <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: [email protected]
Cc: Android Kernel Team <[email protected]>
Cc: Rom Lemarchand <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: Dmitry Shmidt <[email protected]>
Cc: Todd Kjos <[email protected]>
Cc: Christian Poetzsch <[email protected]>
Cc: Amit Pundir <[email protected]>
Cc: Dmitry Torokhov <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Serge E. Hallyn <[email protected]>
Cc: [email protected]
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: John Stultz <[email protected]>
---
v2: Renamed to just CAP_CGROUP_MIGRATE as recommended by Tejun
v3: Switched to just using CAP_SYS_RESOURCE as suggested by Michael
v4: Send out properly folded down version of the patch. :P
---
kernel/cgroup.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 85bc9be..866059a 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2856,7 +2856,8 @@ static int cgroup_procs_write_permission(struct task_struct *task,
*/
if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) &&
!uid_eq(cred->euid, tcred->uid) &&
- !uid_eq(cred->euid, tcred->suid))
+ !uid_eq(cred->euid, tcred->suid) &&
+ !ns_capable(tcred->user_ns, CAP_SYS_RESOURCE))
ret = -EACCES;

if (!ret && cgroup_on_dfl(dst_cgrp)) {
--
2.7.4


2016-11-08 23:41:47

by Kees Cook

[permalink] [raw]
Subject: Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

On Tue, Nov 8, 2016 at 3:28 PM, John Stultz <[email protected]> wrote:
> This patch adds logic to allows a process to migrate other tasks
> between cgroups if they have CAP_SYS_RESOURCE.
>
> In Android (where this feature originated), the ActivityManager tracks
> various application states (TOP_APP, FOREGROUND, BACKGROUND, SYSTEM,
> etc), and then as applications change states, the SchedPolicy logic
> will migrate the application tasks between different cgroups used
> to control the different application states (for example, there is a
> background cpuset cgroup which can limit background tasks to stay
> on one low-power cpu, and the bg_non_interactive cpuctrl cgroup can
> then further limit those background tasks to a small percentage of
> that one cpu's cpu time).
>
> However, for security reasons, Android doesn't want to make the
> system_server (the process that runs the ActivityManager and
> SchedPolicy logic), run as root. So in the Android common.git
> kernel, they have some logic to allow cgroups to loosen their
> permissions so CAP_SYS_NICE tasks can migrate other tasks between
> cgroups.
>
> I feel the approach taken there overloads CAP_SYS_NICE a bit much
> for non-android environments.
>
> So this patch, as suggested by Michael Kerrisk, simply adds a
> check for CAP_SYS_RESOURCE.
>
> I've tested this with AOSP master, and this seems to work well
> as Zygote and system_server already use CAP_SYS_RESOURCE. I've
> also submitted patches against the android-4.4 kernel to change
> it to use CAP_SYS_RESOURCE, and the Android developers just merged
> it.
>
> Cc: Tejun Heo <[email protected]>
> Cc: Li Zefan <[email protected]>
> Cc: Jonathan Corbet <[email protected]>
> Cc: [email protected]
> Cc: Android Kernel Team <[email protected]>
> Cc: Rom Lemarchand <[email protected]>
> Cc: Colin Cross <[email protected]>
> Cc: Dmitry Shmidt <[email protected]>
> Cc: Todd Kjos <[email protected]>
> Cc: Christian Poetzsch <[email protected]>
> Cc: Amit Pundir <[email protected]>
> Cc: Dmitry Torokhov <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Serge E. Hallyn <[email protected]>
> Cc: [email protected]
> Acked-by: Serge Hallyn <[email protected]>
> Signed-off-by: John Stultz <[email protected]>
> ---
> v2: Renamed to just CAP_CGROUP_MIGRATE as recommended by Tejun
> v3: Switched to just using CAP_SYS_RESOURCE as suggested by Michael
> v4: Send out properly folded down version of the patch. :P
> ---
> kernel/cgroup.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 85bc9be..866059a 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -2856,7 +2856,8 @@ static int cgroup_procs_write_permission(struct task_struct *task,
> */
> if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) &&
> !uid_eq(cred->euid, tcred->uid) &&
> - !uid_eq(cred->euid, tcred->suid))
> + !uid_eq(cred->euid, tcred->suid) &&
> + !ns_capable(tcred->user_ns, CAP_SYS_RESOURCE))
> ret = -EACCES;
>
> if (!ret && cgroup_on_dfl(dst_cgrp)) {
> --
> 2.7.4
>

Reviewed-by: Kees Cook <[email protected]>

-Kees

--
Kees Cook
Nexus Security

2016-11-08 23:52:12

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

On Tue, Nov 8, 2016 at 3:28 PM, John Stultz <[email protected]> wrote:
> This patch adds logic to allows a process to migrate other tasks
> between cgroups if they have CAP_SYS_RESOURCE.
>
> In Android (where this feature originated), the ActivityManager tracks
> various application states (TOP_APP, FOREGROUND, BACKGROUND, SYSTEM,
> etc), and then as applications change states, the SchedPolicy logic
> will migrate the application tasks between different cgroups used
> to control the different application states (for example, there is a
> background cpuset cgroup which can limit background tasks to stay
> on one low-power cpu, and the bg_non_interactive cpuctrl cgroup can
> then further limit those background tasks to a small percentage of
> that one cpu's cpu time).
>
> However, for security reasons, Android doesn't want to make the
> system_server (the process that runs the ActivityManager and
> SchedPolicy logic), run as root. So in the Android common.git
> kernel, they have some logic to allow cgroups to loosen their
> permissions so CAP_SYS_NICE tasks can migrate other tasks between
> cgroups.
>
> I feel the approach taken there overloads CAP_SYS_NICE a bit much
> for non-android environments.
>
> So this patch, as suggested by Michael Kerrisk, simply adds a
> check for CAP_SYS_RESOURCE.
>
> I've tested this with AOSP master, and this seems to work well
> as Zygote and system_server already use CAP_SYS_RESOURCE. I've
> also submitted patches against the android-4.4 kernel to change
> it to use CAP_SYS_RESOURCE, and the Android developers just merged
> it.
>

I hate to say it, but I think I may see a problem. Current
developments are afoot to make cgroups do more than resource control.
For example, there's Landlock and there's Daniel's ingress/egress
filter thing. Current cgroup controllers can mostly just DoS their
controlled processes. These new controllers (or controller-like
things) can exfiltrate data and change semantics.

Does anyone have a security model in mind for these controllers and
the cgroups that they're attached to? I'm reasonably confident that
CAP_SYS_RESOURCE is not the answer...

2016-11-09 00:04:37

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

On Tue, Nov 08, 2016 at 03:51:40PM -0800, Andy Lutomirski wrote:
> On Tue, Nov 8, 2016 at 3:28 PM, John Stultz <[email protected]> wrote:
> > This patch adds logic to allows a process to migrate other tasks
> > between cgroups if they have CAP_SYS_RESOURCE.
> >
> > In Android (where this feature originated), the ActivityManager tracks
> > various application states (TOP_APP, FOREGROUND, BACKGROUND, SYSTEM,
> > etc), and then as applications change states, the SchedPolicy logic
> > will migrate the application tasks between different cgroups used
> > to control the different application states (for example, there is a
> > background cpuset cgroup which can limit background tasks to stay
> > on one low-power cpu, and the bg_non_interactive cpuctrl cgroup can
> > then further limit those background tasks to a small percentage of
> > that one cpu's cpu time).
> >
> > However, for security reasons, Android doesn't want to make the
> > system_server (the process that runs the ActivityManager and
> > SchedPolicy logic), run as root. So in the Android common.git
> > kernel, they have some logic to allow cgroups to loosen their
> > permissions so CAP_SYS_NICE tasks can migrate other tasks between
> > cgroups.
> >
> > I feel the approach taken there overloads CAP_SYS_NICE a bit much
> > for non-android environments.
> >
> > So this patch, as suggested by Michael Kerrisk, simply adds a
> > check for CAP_SYS_RESOURCE.
> >
> > I've tested this with AOSP master, and this seems to work well
> > as Zygote and system_server already use CAP_SYS_RESOURCE. I've
> > also submitted patches against the android-4.4 kernel to change
> > it to use CAP_SYS_RESOURCE, and the Android developers just merged
> > it.
> >
>
> I hate to say it, but I think I may see a problem. Current
> developments are afoot to make cgroups do more than resource control.
> For example, there's Landlock and there's Daniel's ingress/egress
> filter thing. Current cgroup controllers can mostly just DoS their
> controlled processes. These new controllers (or controller-like
> things) can exfiltrate data and change semantics.
>
> Does anyone have a security model in mind for these controllers and
> the cgroups that they're attached to? I'm reasonably confident that
> CAP_SYS_RESOURCE is not the answer...

and specifically the answer is... ?
Also would be great if you start with specifying the question first
and the problem you're trying to solve.

2016-11-09 00:12:52

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

On Tue, Nov 8, 2016 at 4:03 PM, Alexei Starovoitov
<[email protected]> wrote:
> On Tue, Nov 08, 2016 at 03:51:40PM -0800, Andy Lutomirski wrote:
>> On Tue, Nov 8, 2016 at 3:28 PM, John Stultz <[email protected]> wrote:
>> > This patch adds logic to allows a process to migrate other tasks
>> > between cgroups if they have CAP_SYS_RESOURCE.
>> >
>> > In Android (where this feature originated), the ActivityManager tracks
>> > various application states (TOP_APP, FOREGROUND, BACKGROUND, SYSTEM,
>> > etc), and then as applications change states, the SchedPolicy logic
>> > will migrate the application tasks between different cgroups used
>> > to control the different application states (for example, there is a
>> > background cpuset cgroup which can limit background tasks to stay
>> > on one low-power cpu, and the bg_non_interactive cpuctrl cgroup can
>> > then further limit those background tasks to a small percentage of
>> > that one cpu's cpu time).
>> >
>> > However, for security reasons, Android doesn't want to make the
>> > system_server (the process that runs the ActivityManager and
>> > SchedPolicy logic), run as root. So in the Android common.git
>> > kernel, they have some logic to allow cgroups to loosen their
>> > permissions so CAP_SYS_NICE tasks can migrate other tasks between
>> > cgroups.
>> >
>> > I feel the approach taken there overloads CAP_SYS_NICE a bit much
>> > for non-android environments.
>> >
>> > So this patch, as suggested by Michael Kerrisk, simply adds a
>> > check for CAP_SYS_RESOURCE.
>> >
>> > I've tested this with AOSP master, and this seems to work well
>> > as Zygote and system_server already use CAP_SYS_RESOURCE. I've
>> > also submitted patches against the android-4.4 kernel to change
>> > it to use CAP_SYS_RESOURCE, and the Android developers just merged
>> > it.
>> >
>>
>> I hate to say it, but I think I may see a problem. Current
>> developments are afoot to make cgroups do more than resource control.
>> For example, there's Landlock and there's Daniel's ingress/egress
>> filter thing. Current cgroup controllers can mostly just DoS their
>> controlled processes. These new controllers (or controller-like
>> things) can exfiltrate data and change semantics.
>>
>> Does anyone have a security model in mind for these controllers and
>> the cgroups that they're attached to? I'm reasonably confident that
>> CAP_SYS_RESOURCE is not the answer...
>
> and specifically the answer is... ?
> Also would be great if you start with specifying the question first
> and the problem you're trying to solve.
>

I don't have a good answer right now. Here are some constraints, though:

1. An insufficiently privileged process should not be able to move a
victim into a dangerous cgroup.

2. An insufficiently privileged process should not be able to move
itself into a dangerous cgroup and then use execve to gain privilege
such that the execve'd program can be compromised.

3. An insufficiently privileged process should not be able to make an
existing cgroup dangerous in a way that could compromise a victim in
that cgroup.

4. An insufficiently privileged process should not be able to make a
cgroup dangerous in a way that bypasses protections that would
otherwise protect execve() as used by itself or some other process in
that cgroup.

Keep in mind that "dangerous" may apply to a cgroup's descendents in
addition to the cgroup being controlled.

2016-11-23 00:59:36

by John Stultz

[permalink] [raw]
Subject: Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

On Tue, Nov 8, 2016 at 4:12 PM, Andy Lutomirski <[email protected]> wrote:
> On Tue, Nov 8, 2016 at 4:03 PM, Alexei Starovoitov
> <[email protected]> wrote:
>> On Tue, Nov 08, 2016 at 03:51:40PM -0800, Andy Lutomirski wrote:
>>>
>>> I hate to say it, but I think I may see a problem. Current
>>> developments are afoot to make cgroups do more than resource control.
>>> For example, there's Landlock and there's Daniel's ingress/egress
>>> filter thing. Current cgroup controllers can mostly just DoS their
>>> controlled processes. These new controllers (or controller-like
>>> things) can exfiltrate data and change semantics.
>>>
>>> Does anyone have a security model in mind for these controllers and
>>> the cgroups that they're attached to? I'm reasonably confident that
>>> CAP_SYS_RESOURCE is not the answer...
>>
>> and specifically the answer is... ?
>> Also would be great if you start with specifying the question first
>> and the problem you're trying to solve.
>>
>
> I don't have a good answer right now. Here are some constraints, though:
>
> 1. An insufficiently privileged process should not be able to move a
> victim into a dangerous cgroup.
>
> 2. An insufficiently privileged process should not be able to move
> itself into a dangerous cgroup and then use execve to gain privilege
> such that the execve'd program can be compromised.
>
> 3. An insufficiently privileged process should not be able to make an
> existing cgroup dangerous in a way that could compromise a victim in
> that cgroup.
>
> 4. An insufficiently privileged process should not be able to make a
> cgroup dangerous in a way that bypasses protections that would
> otherwise protect execve() as used by itself or some other process in
> that cgroup.
>
> Keep in mind that "dangerous" may apply to a cgroup's descendents in
> addition to the cgroup being controlled.

Sorry for taking awhile to get back to you here. I'm a little
befuddled as to what next steps I should consider (and honestly, I'm
not totally sure I really grok your concern here, particularly what
you mean with "dangrous cgroups").

So is going back to the CAP_CGROUP_MIGRATE approach (to properly
separate "sufficiently" from "insufficiently privileged") better?

Or something closer to the original method Android used of each cgroup
having an allow_attach() check which could determine what is
sufficiently privledged for the respective level of danger the cgroup
might poise?

Or just stepping back, what method would you imagine to be reasonable
to allow a specified task to migrate other tasks between cgroups
without it having to be root/suid?

thanks
-john

2016-12-06 00:28:58

by John Stultz

[permalink] [raw]
Subject: Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

On Tue, Nov 22, 2016 at 4:57 PM, John Stultz <[email protected]> wrote:
> On Tue, Nov 8, 2016 at 4:12 PM, Andy Lutomirski <[email protected]> wrote:
>> On Tue, Nov 8, 2016 at 4:03 PM, Alexei Starovoitov
>> <[email protected]> wrote:
>>> On Tue, Nov 08, 2016 at 03:51:40PM -0800, Andy Lutomirski wrote:
>>>>
>>>> I hate to say it, but I think I may see a problem. Current
>>>> developments are afoot to make cgroups do more than resource control.
>>>> For example, there's Landlock and there's Daniel's ingress/egress
>>>> filter thing. Current cgroup controllers can mostly just DoS their
>>>> controlled processes. These new controllers (or controller-like
>>>> things) can exfiltrate data and change semantics.
>>>>
>>>> Does anyone have a security model in mind for these controllers and
>>>> the cgroups that they're attached to? I'm reasonably confident that
>>>> CAP_SYS_RESOURCE is not the answer...
>>>
>>> and specifically the answer is... ?
>>> Also would be great if you start with specifying the question first
>>> and the problem you're trying to solve.
>>>
>>
>> I don't have a good answer right now. Here are some constraints, though:
>>
>> 1. An insufficiently privileged process should not be able to move a
>> victim into a dangerous cgroup.
>>
>> 2. An insufficiently privileged process should not be able to move
>> itself into a dangerous cgroup and then use execve to gain privilege
>> such that the execve'd program can be compromised.
>>
>> 3. An insufficiently privileged process should not be able to make an
>> existing cgroup dangerous in a way that could compromise a victim in
>> that cgroup.
>>
>> 4. An insufficiently privileged process should not be able to make a
>> cgroup dangerous in a way that bypasses protections that would
>> otherwise protect execve() as used by itself or some other process in
>> that cgroup.
>>
>> Keep in mind that "dangerous" may apply to a cgroup's descendents in
>> addition to the cgroup being controlled.
>
> Sorry for taking awhile to get back to you here. I'm a little
> befuddled as to what next steps I should consider (and honestly, I'm
> not totally sure I really grok your concern here, particularly what
> you mean with "dangrous cgroups").
>
> So is going back to the CAP_CGROUP_MIGRATE approach (to properly
> separate "sufficiently" from "insufficiently privileged") better?
>
> Or something closer to the original method Android used of each cgroup
> having an allow_attach() check which could determine what is
> sufficiently privledged for the respective level of danger the cgroup
> might poise?
>
> Or just stepping back, what method would you imagine to be reasonable
> to allow a specified task to migrate other tasks between cgroups
> without it having to be root/suid?

Any suggested feedback here?

thanks
-john

2016-12-06 00:37:25

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

On Mon, Dec 5, 2016 at 4:28 PM, John Stultz <[email protected]> wrote:
> On Tue, Nov 22, 2016 at 4:57 PM, John Stultz <[email protected]> wrote:
>> On Tue, Nov 8, 2016 at 4:12 PM, Andy Lutomirski <[email protected]> wrote:
>>> On Tue, Nov 8, 2016 at 4:03 PM, Alexei Starovoitov
>>> <[email protected]> wrote:
>>>> On Tue, Nov 08, 2016 at 03:51:40PM -0800, Andy Lutomirski wrote:
>>>>>
>>>>> I hate to say it, but I think I may see a problem. Current
>>>>> developments are afoot to make cgroups do more than resource control.
>>>>> For example, there's Landlock and there's Daniel's ingress/egress
>>>>> filter thing. Current cgroup controllers can mostly just DoS their
>>>>> controlled processes. These new controllers (or controller-like
>>>>> things) can exfiltrate data and change semantics.
>>>>>
>>>>> Does anyone have a security model in mind for these controllers and
>>>>> the cgroups that they're attached to? I'm reasonably confident that
>>>>> CAP_SYS_RESOURCE is not the answer...
>>>>
>>>> and specifically the answer is... ?
>>>> Also would be great if you start with specifying the question first
>>>> and the problem you're trying to solve.
>>>>
>>>
>>> I don't have a good answer right now. Here are some constraints, though:
>>>
>>> 1. An insufficiently privileged process should not be able to move a
>>> victim into a dangerous cgroup.
>>>
>>> 2. An insufficiently privileged process should not be able to move
>>> itself into a dangerous cgroup and then use execve to gain privilege
>>> such that the execve'd program can be compromised.
>>>
>>> 3. An insufficiently privileged process should not be able to make an
>>> existing cgroup dangerous in a way that could compromise a victim in
>>> that cgroup.
>>>
>>> 4. An insufficiently privileged process should not be able to make a
>>> cgroup dangerous in a way that bypasses protections that would
>>> otherwise protect execve() as used by itself or some other process in
>>> that cgroup.
>>>
>>> Keep in mind that "dangerous" may apply to a cgroup's descendents in
>>> addition to the cgroup being controlled.
>>
>> Sorry for taking awhile to get back to you here. I'm a little
>> befuddled as to what next steps I should consider (and honestly, I'm
>> not totally sure I really grok your concern here, particularly what
>> you mean with "dangrous cgroups").
>>
>> So is going back to the CAP_CGROUP_MIGRATE approach (to properly
>> separate "sufficiently" from "insufficiently privileged") better?
>>
>> Or something closer to the original method Android used of each cgroup
>> having an allow_attach() check which could determine what is
>> sufficiently privledged for the respective level of danger the cgroup
>> might poise?
>>
>> Or just stepping back, what method would you imagine to be reasonable
>> to allow a specified task to migrate other tasks between cgroups
>> without it having to be root/suid?
>
> Any suggested feedback here?

I really don't know. The cgroupfs interface is a bit unfortunate in
that it doesn't really express the constraints. To safely migrate a
task, ISTM you ought to have some form of privilege over the task
*and* some form of privilege over the cgroup. cgroupfs only handles
the latter.

CAP_CGROUP_MIGRATE ought to be okay. Or maybe cgroupfs needs to gain
a concept of "dangerous" cgroups and further restrict them and
CAP_SYS_RESOURCE should be fine for non-dangerous cgroups? I think I
favor the latter, but it might be nice to hear from Tejun first.

--Andy

2016-12-06 02:00:20

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

On Mon, Dec 05, 2016 at 04:36:51PM -0800, Andy Lutomirski wrote:
> On Mon, Dec 5, 2016 at 4:28 PM, John Stultz <[email protected]> wrote:
> > On Tue, Nov 22, 2016 at 4:57 PM, John Stultz <[email protected]> wrote:
> >> On Tue, Nov 8, 2016 at 4:12 PM, Andy Lutomirski <[email protected]> wrote:
> >>> On Tue, Nov 8, 2016 at 4:03 PM, Alexei Starovoitov
> >>> <[email protected]> wrote:
> >>>> On Tue, Nov 08, 2016 at 03:51:40PM -0800, Andy Lutomirski wrote:
> >>>>>
> >>>>> I hate to say it, but I think I may see a problem. Current
> >>>>> developments are afoot to make cgroups do more than resource control.
> >>>>> For example, there's Landlock and there's Daniel's ingress/egress
> >>>>> filter thing. Current cgroup controllers can mostly just DoS their
> >>>>> controlled processes. These new controllers (or controller-like
> >>>>> things) can exfiltrate data and change semantics.
> >>>>>
> >>>>> Does anyone have a security model in mind for these controllers and
> >>>>> the cgroups that they're attached to? I'm reasonably confident that
> >>>>> CAP_SYS_RESOURCE is not the answer...
> >>>>
> >>>> and specifically the answer is... ?
> >>>> Also would be great if you start with specifying the question first
> >>>> and the problem you're trying to solve.
> >>>>
> >>>
> >>> I don't have a good answer right now. Here are some constraints, though:
> >>>
> >>> 1. An insufficiently privileged process should not be able to move a
> >>> victim into a dangerous cgroup.
> >>>
> >>> 2. An insufficiently privileged process should not be able to move
> >>> itself into a dangerous cgroup and then use execve to gain privilege
> >>> such that the execve'd program can be compromised.
> >>>
> >>> 3. An insufficiently privileged process should not be able to make an
> >>> existing cgroup dangerous in a way that could compromise a victim in
> >>> that cgroup.
> >>>
> >>> 4. An insufficiently privileged process should not be able to make a
> >>> cgroup dangerous in a way that bypasses protections that would
> >>> otherwise protect execve() as used by itself or some other process in
> >>> that cgroup.
> >>>
> >>> Keep in mind that "dangerous" may apply to a cgroup's descendents in
> >>> addition to the cgroup being controlled.
> >>
> >> Sorry for taking awhile to get back to you here. I'm a little
> >> befuddled as to what next steps I should consider (and honestly, I'm
> >> not totally sure I really grok your concern here, particularly what
> >> you mean with "dangrous cgroups").
> >>
> >> So is going back to the CAP_CGROUP_MIGRATE approach (to properly
> >> separate "sufficiently" from "insufficiently privileged") better?
> >>
> >> Or something closer to the original method Android used of each cgroup
> >> having an allow_attach() check which could determine what is
> >> sufficiently privledged for the respective level of danger the cgroup
> >> might poise?
> >>
> >> Or just stepping back, what method would you imagine to be reasonable
> >> to allow a specified task to migrate other tasks between cgroups
> >> without it having to be root/suid?
> >
> > Any suggested feedback here?
>
> I really don't know. The cgroupfs interface is a bit unfortunate in
> that it doesn't really express the constraints. To safely migrate a
> task, ISTM you ought to have some form of privilege over the task
> *and* some form of privilege over the cgroup.

Agreed. The problem is that the privilege required should depend on
the controller (I guess). For memory and cpuset, CAP_SYS_NICE seems
right. Perhaps CAP_SYS_RESOURCE would be needed for some.. but then,
as I look through the lists (capabilities(7) and the list of controllers),
it seems like CAP_SYS_NICE works for everything. What else would we need?
Maybe CAP_NET_ADMIN for net_cls and net_prio? CAP_SYS_RESOURCE|CAP_SYS_ADMIN
for pids?

> cgroupfs only handles
> the latter.

If we need different checks for different controllers, we can add
checks to cgroupfs.

> CAP_CGROUP_MIGRATE ought to be okay. Or maybe cgroupfs needs to gain
> a concept of "dangerous" cgroups and further restrict them and
> CAP_SYS_RESOURCE should be fine for non-dangerous cgroups? I think I
> favor the latter, but it might be nice to hear from Tejun first.
>
> --Andy

2016-12-06 16:55:30

by Tejun Heo

[permalink] [raw]
Subject: Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

Hello,

On Mon, Dec 05, 2016 at 04:36:51PM -0800, Andy Lutomirski wrote:
> I really don't know. The cgroupfs interface is a bit unfortunate in
> that it doesn't really express the constraints. To safely migrate a
> task, ISTM you ought to have some form of privilege over the task
> *and* some form of privilege over the cgroup. cgroupfs only handles
> the latter.
>
> CAP_CGROUP_MIGRATE ought to be okay. Or maybe cgroupfs needs to gain
> a concept of "dangerous" cgroups and further restrict them and
> CAP_SYS_RESOURCE should be fine for non-dangerous cgroups? I think I
> favor the latter, but it might be nice to hear from Tejun first.

If we can't do CAP_SYS_RESOURCE due to overlaps, let's go with a
separate CAP. While for android and cgroup v1, it's nice to have a
finer grained CAP for security control, privilege isolation in cgroup
should also primarily done through hierarchical delegation. It
doesn't make sense to have another system in parallel.

We can't do it properly on v1 because some controllers aren't properly
hierarchical and delegation model isn't well defined. e.g. nothing
prevents a process from being pulled across different subtrees with
the same delegation, but v2 can do it properly. All that's necessary
is to make the CAP test OR'd to other perm checks instead of AND'ing
so that the CAP just allows overriding restrictions expressed through
delegation but it's normally possible to move processes around in
one's own delegated subtree.

Thanks.

--
tejun

2016-12-06 16:57:51

by Tejun Heo

[permalink] [raw]
Subject: Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

Hello, Serge.

On Mon, Dec 05, 2016 at 08:00:11PM -0600, Serge E. Hallyn wrote:
> > I really don't know. The cgroupfs interface is a bit unfortunate in
> > that it doesn't really express the constraints. To safely migrate a
> > task, ISTM you ought to have some form of privilege over the task
> > *and* some form of privilege over the cgroup.
>
> Agreed. The problem is that the privilege required should depend on
> the controller (I guess). For memory and cpuset, CAP_SYS_NICE seems
> right. Perhaps CAP_SYS_RESOURCE would be needed for some.. but then,
> as I look through the lists (capabilities(7) and the list of controllers),
> it seems like CAP_SYS_NICE works for everything. What else would we need?
> Maybe CAP_NET_ADMIN for net_cls and net_prio? CAP_SYS_RESOURCE|CAP_SYS_ADMIN
> for pids?

Please see my other reply but I don't think it's a good idea to have
these extra checks on the side when there already is hierarchical
delegation mechanism which should be able to handle both resource
control and process management delegation.

Thanks.

--
tejun

2016-12-06 17:01:41

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

On Tue, Dec 6, 2016 at 8:55 AM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On Mon, Dec 05, 2016 at 04:36:51PM -0800, Andy Lutomirski wrote:
>> I really don't know. The cgroupfs interface is a bit unfortunate in
>> that it doesn't really express the constraints. To safely migrate a
>> task, ISTM you ought to have some form of privilege over the task
>> *and* some form of privilege over the cgroup. cgroupfs only handles
>> the latter.
>>
>> CAP_CGROUP_MIGRATE ought to be okay. Or maybe cgroupfs needs to gain
>> a concept of "dangerous" cgroups and further restrict them and
>> CAP_SYS_RESOURCE should be fine for non-dangerous cgroups? I think I
>> favor the latter, but it might be nice to hear from Tejun first.
>
> If we can't do CAP_SYS_RESOURCE due to overlaps, let's go with a
> separate CAP. While for android and cgroup v1, it's nice to have a
> finer grained CAP for security control, privilege isolation in cgroup
> should also primarily done through hierarchical delegation. It
> doesn't make sense to have another system in parallel.
>
> We can't do it properly on v1 because some controllers aren't properly
> hierarchical and delegation model isn't well defined. e.g. nothing
> prevents a process from being pulled across different subtrees with
> the same delegation, but v2 can do it properly. All that's necessary
> is to make the CAP test OR'd to other perm checks instead of AND'ing
> so that the CAP just allows overriding restrictions expressed through
> delegation but it's normally possible to move processes around in
> one's own delegated subtree.

How would one be granted the right to move processes around in one's
own subtree?

Are you imagining that, if you're in /a/b and you want to move a
process that's currently in /a/b/c to /a/b/d then you're allowed to
because the target process is in your tree? If so, I doubt this has
the security properties you want -- namely, if you can cooperate with
anyone in /, even if they're unprivileged, you can break it.

2016-12-06 18:12:27

by Tejun Heo

[permalink] [raw]
Subject: Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

Hello,

On Tue, Dec 06, 2016 at 09:01:17AM -0800, Andy Lutomirski wrote:
> How would one be granted the right to move processes around in one's
> own subtree?

Through expicit delegation - chowning of the directory and
cgroup.procs file.

> Are you imagining that, if you're in /a/b and you want to move a
> process that's currently in /a/b/c to /a/b/d then you're allowed to
> because the target process is in your tree? If so, I doubt this has
> the security properties you want -- namely, if you can cooperate with
> anyone in /, even if they're unprivileged, you can break it.

Delegation is an explicit operation and reflected in the ownership of
the subdirectories and cgroup interface files in them. The
subhierarchy containment is achieved by requiring the user who's
trying to migrate a process to have write perm on cgroup.procs on the
common ancestor of the source and target in addition to the target.
So, a user who has a subhierarchy delegated to it can move processes
around inside but not out of or into it. Also, if there are multiple
delegated disjoint subhierarchies, processes can't be moved across
them by the delegatee either.

Thanks.

--
tejun

2016-12-06 18:15:06

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

On Tue, Dec 6, 2016 at 10:12 AM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On Tue, Dec 06, 2016 at 09:01:17AM -0800, Andy Lutomirski wrote:
>> How would one be granted the right to move processes around in one's
>> own subtree?
>
> Through expicit delegation - chowning of the directory and
> cgroup.procs file.
>
>> Are you imagining that, if you're in /a/b and you want to move a
>> process that's currently in /a/b/c to /a/b/d then you're allowed to
>> because the target process is in your tree? If so, I doubt this has
>> the security properties you want -- namely, if you can cooperate with
>> anyone in /, even if they're unprivileged, you can break it.
>
> Delegation is an explicit operation and reflected in the ownership of
> the subdirectories and cgroup interface files in them. The
> subhierarchy containment is achieved by requiring the user who's
> trying to migrate a process to have write perm on cgroup.procs on the
> common ancestor of the source and target in addition to the target.

OK, I see what you're doing. That's interesting.

2016-12-06 18:23:22

by Tejun Heo

[permalink] [raw]
Subject: Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

Hello,

On Tue, Dec 06, 2016 at 10:13:53AM -0800, Andy Lutomirski wrote:
> > Delegation is an explicit operation and reflected in the ownership of
> > the subdirectories and cgroup interface files in them. The
> > subhierarchy containment is achieved by requiring the user who's
> > trying to migrate a process to have write perm on cgroup.procs on the
> > common ancestor of the source and target in addition to the target.
>
> OK, I see what you're doing. That's interesting.

It's something born out of usages of cgroup v1. People used it that
way (chowning files and directories) and combined with the uid checksn
it yielded something which is useful sometimes, but it always had
issues with hierarchical behaviors, which files to chmod and the weird
combination of uid checks. cgroup v2 has a clear delegation model but
the uid checks are still left in as not changing was the default.

It's not necessary and I'm thinking about queueing something like the
following in the next cycle.

As for the android CAP discussion, I think it'd be nice to share an
existing CAP but if we can't find a good one to share, let's create a
new one.

Thanks.

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index 4cc07ce..34b4b44 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -328,14 +328,12 @@ a process with a non-root euid to migrate a target process into a
cgroup by writing its PID to the "cgroup.procs" file, the following
conditions must be met.

-- The writer's euid must match either uid or suid of the target process.
-
- The writer must have write access to the "cgroup.procs" file.

- The writer must have write access to the "cgroup.procs" file of the
common ancestor of the source and destination cgroups.

-The above three constraints ensure that while a delegatee may migrate
+The above two constraints ensure that while a delegatee may migrate
processes around freely in the delegated sub-hierarchy it can't pull
in from or push out to outside the sub-hierarchy.

@@ -350,10 +348,10 @@ all processes under C0 and C1 belong to U0.

Let's also say U0 wants to write the PID of a process which is
currently in C10 into "C00/cgroup.procs". U0 has write access to the
-file and uid match on the process; however, the common ancestor of the
-source cgroup C10 and the destination cgroup C00 is above the points
-of delegation and U0 would not have write access to its "cgroup.procs"
-files and thus the write will be denied with -EACCES.
+file; however, the common ancestor of the source cgroup C10 and the
+destination cgroup C00 is above the points of delegation and U0 would
+not have write access to its "cgroup.procs" files and thus the write
+will be denied with -EACCES.


2-6. Guidelines
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 85bc9be..49384ff 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2854,12 +2854,18 @@ static int cgroup_procs_write_permission(struct task_struct *task,
* even if we're attaching all tasks in the thread group, we only
* need to check permissions on one of them.
*/
- if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) &&
- !uid_eq(cred->euid, tcred->uid) &&
- !uid_eq(cred->euid, tcred->suid))
- ret = -EACCES;

- if (!ret && cgroup_on_dfl(dst_cgrp)) {
+ /* root is allowed to do anything */
+ if (uid_eq(cred->euid, GLOBAL_ROOT_UID))
+ goto out;
+
+ /*
+ * On v2, follow the delegation model. Inside a delegated subtree,
+ * the delegatee can move around the processes however it sees fit.
+ *
+ * On v1, a process should match one of the target's uids.
+ */
+ if (cgroup_on_dfl(dst_cgrp)) {
struct super_block *sb = of->file->f_path.dentry->d_sb;
struct cgroup *cgrp;
struct inode *inode;
@@ -2877,8 +2883,11 @@ static int cgroup_procs_write_permission(struct task_struct *task,
ret = inode_permission(inode, MAY_WRITE);
iput(inode);
}
+ } else if (!uid_eq(cred->euid, tcred->uid) &&
+ !uid_eq(cred->euid, tcred->suid)) {
+ ret = -EACCES;
}
-
+out:
put_cred(tcred);
return ret;
}

2016-12-09 05:39:43

by John Stultz

[permalink] [raw]
Subject: Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

On Tue, Dec 6, 2016 at 10:23 AM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On Tue, Dec 06, 2016 at 10:13:53AM -0800, Andy Lutomirski wrote:
>> > Delegation is an explicit operation and reflected in the ownership of
>> > the subdirectories and cgroup interface files in them. The
>> > subhierarchy containment is achieved by requiring the user who's
>> > trying to migrate a process to have write perm on cgroup.procs on the
>> > common ancestor of the source and target in addition to the target.
>>
>> OK, I see what you're doing. That's interesting.
>
> It's something born out of usages of cgroup v1. People used it that
> way (chowning files and directories) and combined with the uid checksn
> it yielded something which is useful sometimes, but it always had
> issues with hierarchical behaviors, which files to chmod and the weird
> combination of uid checks. cgroup v2 has a clear delegation model but
> the uid checks are still left in as not changing was the default.
>
> It's not necessary and I'm thinking about queueing something like the
> following in the next cycle.
>
> As for the android CAP discussion, I think it'd be nice to share an
> existing CAP but if we can't find a good one to share, let's create a
> new one.

So just to clarify the discussion for my purposes and make sure I
understood, per-cgroup CAP rules was not desired, and instead we
should either utilize an existing cap (are there still objections to
CAP_SYS_RESOURCE? - this isn't clear to me) or create a new one (ie,
bring back the older CAP_CGROUP_MIGRATE patch).

Tejun: Do you have a more finished version of your patch that I should
add my changes on top of?

thanks
-john

2016-12-09 13:27:31

by Tejun Heo

[permalink] [raw]
Subject: Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

Hello, John.

On Thu, Dec 08, 2016 at 09:39:38PM -0800, John Stultz wrote:
> So just to clarify the discussion for my purposes and make sure I
> understood, per-cgroup CAP rules was not desired, and instead we
> should either utilize an existing cap (are there still objections to
> CAP_SYS_RESOURCE? - this isn't clear to me) or create a new one (ie,
> bring back the older CAP_CGROUP_MIGRATE patch).

Let's create a new one. It looks to be a bit too different to share
with an existing one.

> Tejun: Do you have a more finished version of your patch that I should
> add my changes on top of?

Oh, just submit the patch on top of the current for-next. I can queue
mine on top of yours. They are mostly orthogonal.

Thanks.

--
tejun