2021-04-22 12:44:02

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH 17/19] sched: Inherit task cookie on fork()

Note that sched_core_fork() is called from under tasklist_lock, and
not from sched_fork() earlier. This avoids a few races later.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
include/linux/sched.h | 2 ++
kernel/fork.c | 3 +++
kernel/sched/core_sched.c | 6 ++++++
3 files changed, 11 insertions(+)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2172,8 +2172,10 @@ const struct cpumask *sched_trace_rd_spa

#ifdef CONFIG_SCHED_CORE
extern void sched_core_free(struct task_struct *tsk);
+extern void sched_core_fork(struct task_struct *p);
#else
static inline void sched_core_free(struct task_struct *tsk) { }
+static inline void sched_core_fork(struct task_struct *p) { }
#endif

#endif
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2249,6 +2249,8 @@ static __latent_entropy struct task_stru

klp_copy_process(p);

+ sched_core_fork(p);
+
spin_lock(&current->sighand->siglock);

/*
@@ -2336,6 +2338,7 @@ static __latent_entropy struct task_stru
return p;

bad_fork_cancel_cgroup:
+ sched_core_free(p);
spin_unlock(&current->sighand->siglock);
write_unlock_irq(&tasklist_lock);
cgroup_cancel_fork(p, args);
--- a/kernel/sched/core_sched.c
+++ b/kernel/sched/core_sched.c
@@ -100,6 +100,12 @@ static unsigned long sched_core_clone_co
return cookie;
}

+void sched_core_fork(struct task_struct *p)
+{
+ RB_CLEAR_NODE(&p->core_node);
+ p->core_cookie = sched_core_clone_cookie(current);
+}
+
void sched_core_free(struct task_struct *p)
{
sched_core_put_cookie(p->core_cookie);



2021-05-10 16:10:56

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH 17/19] sched: Inherit task cookie on fork()

Hi Peter,

On Thu, Apr 22, 2021 at 8:36 AM Peter Zijlstra <[email protected]> wrote:
>
> Note that sched_core_fork() is called from under tasklist_lock, and
> not from sched_fork() earlier. This avoids a few races later.
>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> include/linux/sched.h | 2 ++
> kernel/fork.c | 3 +++
> kernel/sched/core_sched.c | 6 ++++++
> 3 files changed, 11 insertions(+)
>
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2172,8 +2172,10 @@ const struct cpumask *sched_trace_rd_spa
>
> #ifdef CONFIG_SCHED_CORE
> extern void sched_core_free(struct task_struct *tsk);
> +extern void sched_core_fork(struct task_struct *p);
> #else
> static inline void sched_core_free(struct task_struct *tsk) { }
> +static inline void sched_core_fork(struct task_struct *p) { }
> #endif
>
> #endif
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2249,6 +2249,8 @@ static __latent_entropy struct task_stru
>
> klp_copy_process(p);
>
> + sched_core_fork(p);
> +
> spin_lock(&current->sighand->siglock);
>
> /*
> @@ -2336,6 +2338,7 @@ static __latent_entropy struct task_stru
> return p;
>
> bad_fork_cancel_cgroup:
> + sched_core_free(p);
> spin_unlock(&current->sighand->siglock);
> write_unlock_irq(&tasklist_lock);
> cgroup_cancel_fork(p, args);
> --- a/kernel/sched/core_sched.c
> +++ b/kernel/sched/core_sched.c
> @@ -100,6 +100,12 @@ static unsigned long sched_core_clone_co
> return cookie;
> }
>
> +void sched_core_fork(struct task_struct *p)
> +{
> + RB_CLEAR_NODE(&p->core_node);
> + p->core_cookie = sched_core_clone_cookie(current);

Does this make sense also for !CLONE_THREAD forks?

With earlier versions of core scheduling, we have done the following
on ChromeOS. Basically, if it is a "thread clone", share the cookie
since memory is shared within a process (same address space within a
process). Otherwise, set the cookie to a new unique cookie so that the
new process does not share core with parent initially (since their
address space will be different).

Example Psedu-ocode in sched_fork():

if (current->core_cookie && (clone_flags & CLONE_THREAD)) {
p->core_cookie = clone_cookie(current);
} else {
p->core_cookie = create_new_cookie();
}

In your version though, I don't see that it always clones the cookie
whether it is a CLONE_THREAD clone or not. Is that correct? I feel
that's a security issue.

-Joel

2021-05-10 16:25:04

by chris hyser

[permalink] [raw]
Subject: Re: [PATCH 17/19] sched: Inherit task cookie on fork()

On 5/10/21 12:06 PM, Joel Fernandes wrote:
> Hi Peter,
>
> On Thu, Apr 22, 2021 at 8:36 AM Peter Zijlstra <[email protected]> wrote:
>>
>> Note that sched_core_fork() is called from under tasklist_lock, and
>> not from sched_fork() earlier. This avoids a few races later.
>>
>> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
>> ---
>> include/linux/sched.h | 2 ++
>> kernel/fork.c | 3 +++
>> kernel/sched/core_sched.c | 6 ++++++
>> 3 files changed, 11 insertions(+)
>>
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -2172,8 +2172,10 @@ const struct cpumask *sched_trace_rd_spa
>>
>> #ifdef CONFIG_SCHED_CORE
>> extern void sched_core_free(struct task_struct *tsk);
>> +extern void sched_core_fork(struct task_struct *p);
>> #else
>> static inline void sched_core_free(struct task_struct *tsk) { }
>> +static inline void sched_core_fork(struct task_struct *p) { }
>> #endif
>>
>> #endif
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -2249,6 +2249,8 @@ static __latent_entropy struct task_stru
>>
>> klp_copy_process(p);
>>
>> + sched_core_fork(p);
>> +
>> spin_lock(&current->sighand->siglock);
>>
>> /*
>> @@ -2336,6 +2338,7 @@ static __latent_entropy struct task_stru
>> return p;
>>
>> bad_fork_cancel_cgroup:
>> + sched_core_free(p);
>> spin_unlock(&current->sighand->siglock);
>> write_unlock_irq(&tasklist_lock);
>> cgroup_cancel_fork(p, args);
>> --- a/kernel/sched/core_sched.c
>> +++ b/kernel/sched/core_sched.c
>> @@ -100,6 +100,12 @@ static unsigned long sched_core_clone_co
>> return cookie;
>> }
>>
>> +void sched_core_fork(struct task_struct *p)
>> +{
>> + RB_CLEAR_NODE(&p->core_node);
>> + p->core_cookie = sched_core_clone_cookie(current);
>
> Does this make sense also for !CLONE_THREAD forks?

Yes. Given the absence of a cgroup interface, fork inheritance (clone the cookie) is the best way to create shared
cookie hierarchies. The security issue you mentioned was handled in my original code by setting a unique cookie on
'exec', but Peter took that out for the reason mentioned above. It was part of the "lets get this in compromise" effort.

-chrish

2021-05-10 20:50:30

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH 17/19] sched: Inherit task cookie on fork()

On Mon, May 10, 2021 at 12:23 PM Chris Hyser <[email protected]> wrote:
>
> On 5/10/21 12:06 PM, Joel Fernandes wrote:
> > Hi Peter,
> >
> > On Thu, Apr 22, 2021 at 8:36 AM Peter Zijlstra <[email protected]> wrote:
> >>
> >> Note that sched_core_fork() is called from under tasklist_lock, and
> >> not from sched_fork() earlier. This avoids a few races later.
> >>
> >> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> >> ---
> >> include/linux/sched.h | 2 ++
> >> kernel/fork.c | 3 +++
> >> kernel/sched/core_sched.c | 6 ++++++
> >> 3 files changed, 11 insertions(+)
> >>
> >> --- a/include/linux/sched.h
> >> +++ b/include/linux/sched.h
> >> @@ -2172,8 +2172,10 @@ const struct cpumask *sched_trace_rd_spa
> >>
> >> #ifdef CONFIG_SCHED_CORE
> >> extern void sched_core_free(struct task_struct *tsk);
> >> +extern void sched_core_fork(struct task_struct *p);
> >> #else
> >> static inline void sched_core_free(struct task_struct *tsk) { }
> >> +static inline void sched_core_fork(struct task_struct *p) { }
> >> #endif
> >>
> >> #endif
> >> --- a/kernel/fork.c
> >> +++ b/kernel/fork.c
> >> @@ -2249,6 +2249,8 @@ static __latent_entropy struct task_stru
> >>
> >> klp_copy_process(p);
> >>
> >> + sched_core_fork(p);
> >> +
> >> spin_lock(&current->sighand->siglock);
> >>
> >> /*
> >> @@ -2336,6 +2338,7 @@ static __latent_entropy struct task_stru
> >> return p;
> >>
> >> bad_fork_cancel_cgroup:
> >> + sched_core_free(p);
> >> spin_unlock(&current->sighand->siglock);
> >> write_unlock_irq(&tasklist_lock);
> >> cgroup_cancel_fork(p, args);
> >> --- a/kernel/sched/core_sched.c
> >> +++ b/kernel/sched/core_sched.c
> >> @@ -100,6 +100,12 @@ static unsigned long sched_core_clone_co
> >> return cookie;
> >> }
> >>
> >> +void sched_core_fork(struct task_struct *p)
> >> +{
> >> + RB_CLEAR_NODE(&p->core_node);
> >> + p->core_cookie = sched_core_clone_cookie(current);
> >
> > Does this make sense also for !CLONE_THREAD forks?
>
> Yes. Given the absence of a cgroup interface, fork inheritance (clone the cookie) is the best way to create shared
> cookie hierarchies. The security issue you mentioned was handled in my original code by setting a unique cookie on
> 'exec', but Peter took that out for the reason mentioned above. It was part of the "lets get this in compromise" effort.

Thanks for sharing the history of it. I guess one can argue that this
policy is better to be hardcoded in userspace since core-scheduling
can be used for non-security usecases as well. Maybe one could simply
call the prctl(2) from userspace if they so desire, before calling
exec() ?

- Joel

2021-05-10 21:39:29

by chris hyser

[permalink] [raw]
Subject: Re: [PATCH 17/19] sched: Inherit task cookie on fork()

On 5/10/21 4:47 PM, Joel Fernandes wrote:
> On Mon, May 10, 2021 at 12:23 PM Chris Hyser <[email protected]> wrote:

>>>> +void sched_core_fork(struct task_struct *p)
>>>> +{
>>>> + RB_CLEAR_NODE(&p->core_node);
>>>> + p->core_cookie = sched_core_clone_cookie(current);
>>>
>>> Does this make sense also for !CLONE_THREAD forks?
>>
>> Yes. Given the absence of a cgroup interface, fork inheritance (clone the cookie) is the best way to create shared
>> cookie hierarchies. The security issue you mentioned was handled in my original code by setting a unique cookie on
>> 'exec', but Peter took that out for the reason mentioned above. It was part of the "lets get this in compromise" effort.
>
> Thanks for sharing the history of it. I guess one can argue that this
> policy is better to be hardcoded in userspace since core-scheduling
> can be used for non-security usecases as well. Maybe one could simply
> call the prctl(2) from userspace if they so desire, before calling
> exec() ?

I think the defining use case is a container's init. If the cookie is set for it by the container creator and without
any other user code knowing about core_sched, every descendant spawned will have the same cookie and be in the same
core_sched group much like the cgroup interface had provided. If we create a unique cookie in the kernel either on fork
or exec, we are secure, but we will now have 1000's of core sched groups.

CLEAR was also removed (temporarily, I hope) because a core_sched knowledgeable program in the example core_sched
container group should not be able to remove itself from _all_ core sched groups. It can modify it's cookie, but that is
no different than the normal case.

Both of these beg for a kernel policy, but that discussion was TBD.

-chrish

2021-05-12 09:07:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 17/19] sched: Inherit task cookie on fork()

On Mon, May 10, 2021 at 05:38:18PM -0400, Chris Hyser wrote:
> On 5/10/21 4:47 PM, Joel Fernandes wrote:
> > On Mon, May 10, 2021 at 12:23 PM Chris Hyser <[email protected]> wrote:
>
> > > > > +void sched_core_fork(struct task_struct *p)
> > > > > +{
> > > > > + RB_CLEAR_NODE(&p->core_node);
> > > > > + p->core_cookie = sched_core_clone_cookie(current);
> > > >
> > > > Does this make sense also for !CLONE_THREAD forks?
> > >
> > > Yes. Given the absence of a cgroup interface, fork inheritance
> > > (clone the cookie) is the best way to create shared cookie
> > > hierarchies. The security issue you mentioned was handled in my
> > > original code by setting a unique cookie on 'exec', but Peter took
> > > that out for the reason mentioned above. It was part of the "lets
> > > get this in compromise" effort.

Right, not only that, given all this is moot when parent and child have
the same PTRACE permissions, since if they do, they can inspect one
another's innards anyway, exec()/fork() just fundamentally isn't a
magical boundary we should care about.

The only special case there is SUID exec(), because in that case the
actual credentials change and the PTRACE permissions do actually change.

I sorta had a patch to do that, but it's yuck because that cred change
happens after the point of no return and we need an allocation for the
new cookie. Now, we could rely on the fact that a task context
allocation (GFP_KERNEL) for something as small as our cookie will never
fail and hence we shouldn't be bothered by it, we should do the error
path and yuck.

> > Thanks for sharing the history of it. I guess one can argue that this
> > policy is better to be hardcoded in userspace since core-scheduling
> > can be used for non-security usecases as well. Maybe one could simply
> > call the prctl(2) from userspace if they so desire, before calling
> > exec() ?
>
> I think the defining use case is a container's init. If the cookie is set
> for it by the container creator and without any other user code knowing
> about core_sched, every descendant spawned will have the same cookie and be
> in the same core_sched group much like the cgroup interface had provided. If
> we create a unique cookie in the kernel either on fork or exec, we are
> secure, but we will now have 1000's of core sched groups.
>
> CLEAR was also removed (temporarily, I hope) because a core_sched
> knowledgeable program in the example core_sched container group should not
> be able to remove itself from _all_ core sched groups. It can modify it's
> cookie, but that is no different than the normal case.

Note that much of clear is possible by using SHARE_FROM on your parent
to reset the cookie.

> Both of these beg for a kernel policy, but that discussion was TBD.

Right, I need a Champion that actually cares about cgroups and has
use-cases to go argue with TJ on this. I've proposed code that I think
has sane semantics, but I'm not in a position to argue for it, given I
think a world without cgroups is a better world :-)))

Subject: [tip: sched/core] sched: Inherit task cookie on fork()

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 85dd3f61203c5cfa72b308ff327b5fbf3fc1ce5e
Gitweb: https://git.kernel.org/tip/85dd3f61203c5cfa72b308ff327b5fbf3fc1ce5e
Author: Peter Zijlstra <[email protected]>
AuthorDate: Mon, 29 Mar 2021 15:18:35 +02:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 12 May 2021 11:43:31 +02:00

sched: Inherit task cookie on fork()

Note that sched_core_fork() is called from under tasklist_lock, and
not from sched_fork() earlier. This avoids a few races later.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Don Hiatt <[email protected]>
Tested-by: Hongyu Ning <[email protected]>
Tested-by: Vincent Guittot <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
include/linux/sched.h | 2 ++
kernel/fork.c | 3 +++
kernel/sched/core_sched.c | 6 ++++++
3 files changed, 11 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index eab3f7c..fba47e5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2181,8 +2181,10 @@ const struct cpumask *sched_trace_rd_span(struct root_domain *rd);

#ifdef CONFIG_SCHED_CORE
extern void sched_core_free(struct task_struct *tsk);
+extern void sched_core_fork(struct task_struct *p);
#else
static inline void sched_core_free(struct task_struct *tsk) { }
+static inline void sched_core_fork(struct task_struct *p) { }
#endif

#endif
diff --git a/kernel/fork.c b/kernel/fork.c
index d16c60c..e7fd928 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2251,6 +2251,8 @@ static __latent_entropy struct task_struct *copy_process(

klp_copy_process(p);

+ sched_core_fork(p);
+
spin_lock(&current->sighand->siglock);

/*
@@ -2338,6 +2340,7 @@ static __latent_entropy struct task_struct *copy_process(
return p;

bad_fork_cancel_cgroup:
+ sched_core_free(p);
spin_unlock(&current->sighand->siglock);
write_unlock_irq(&tasklist_lock);
cgroup_cancel_fork(p, args);
diff --git a/kernel/sched/core_sched.c b/kernel/sched/core_sched.c
index 8d0869a..dcbbeae 100644
--- a/kernel/sched/core_sched.c
+++ b/kernel/sched/core_sched.c
@@ -103,6 +103,12 @@ static unsigned long sched_core_clone_cookie(struct task_struct *p)
return cookie;
}

+void sched_core_fork(struct task_struct *p)
+{
+ RB_CLEAR_NODE(&p->core_node);
+ p->core_cookie = sched_core_clone_cookie(current);
+}
+
void sched_core_free(struct task_struct *p)
{
sched_core_put_cookie(p->core_cookie);

2021-05-12 20:53:13

by Josh Don

[permalink] [raw]
Subject: Re: [PATCH 17/19] sched: Inherit task cookie on fork()

On Wed, May 12, 2021 at 2:05 AM Peter Zijlstra <[email protected]> wrote:
> Right, I need a Champion that actually cares about cgroups and has
> use-cases to go argue with TJ on this. I've proposed code that I think
> has sane semantics, but I'm not in a position to argue for it, given I
> think a world without cgroups is a better world :-)))

Not sure if Tejun has any thoughts on
http://lkml.kernel.org/r/CABk29NtahuW6UERvRdK5v8My_MfPsoESDKXUjGdvaQcHOJEMvg@mail.gmail.com.

We're looking at using the prctl interface with one of our main
internal users of core scheduling. As an example, suppose we have a
management process that wants to make tasks A and B share a cookie:
- Spawn a new thread m, which then does the following, and exits.
- PR_SCHED_CORE_CREATE for just its own PID
- PR_SCHED_CORE_SHARE_TO A
- PR_SCHED_CORE_SHARE_TO B

That seems to work ok; I'll follow up if there are any pain points
that aren't easily addressed with the prctl interface.

2021-05-12 22:36:53

by Don Hiatt

[permalink] [raw]
Subject: Re: [PATCH 17/19] sched: Inherit task cookie on fork()

On Wed, May 12, 2021 at 1:58 PM Josh Don <[email protected]> wrote:
>
> On Wed, May 12, 2021 at 2:05 AM Peter Zijlstra <[email protected]> wrote:
> > Right, I need a Champion that actually cares about cgroups and has
> > use-cases to go argue with TJ on this. I've proposed code that I think
> > has sane semantics, but I'm not in a position to argue for it, given I
> > think a world without cgroups is a better world :-)))
>
> Not sure if Tejun has any thoughts on
> http://lkml.kernel.org/r/CABk29NtahuW6UERvRdK5v8My_MfPsoESDKXUjGdvaQcHOJEMvg@mail.gmail.com.
>
> We're looking at using the prctl interface with one of our main
> internal users of core scheduling. As an example, suppose we have a
> management process that wants to make tasks A and B share a cookie:
> - Spawn a new thread m, which then does the following, and exits.
> - PR_SCHED_CORE_CREATE for just its own PID
> - PR_SCHED_CORE_SHARE_TO A
> - PR_SCHED_CORE_SHARE_TO B
>
> That seems to work ok; I'll follow up if there are any pain points
> that aren't easily addressed with the prctl interface.

This is exactly what I'm doing to tag all the processes for qemu. I have as
shell script that creates a cookie for $BASHPID and then SHARE_TO all
the process and it works just great. :)

don