2022-11-25 03:27:20

by Waiman Long

[permalink] [raw]
Subject: [PATCH-tip v4] sched: Fix NULL user_cpus_ptr check in dup_user_cpus_ptr()

In general, a non-null user_cpus_ptr will remain set until the task dies.
A possible exception to this is the fact that do_set_cpus_allowed()
will clear a non-null user_cpus_ptr. To allow this possible racing
condition, we need to check for NULL user_cpus_ptr under the pi_lock
before duping the user mask.

Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
Signed-off-by: Waiman Long <[email protected]>
---
kernel/sched/core.c | 32 ++++++++++++++++++++++++++++----
1 file changed, 28 insertions(+), 4 deletions(-)

[v4] Minor comment update

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8df51b08bb38..f2b75faaf71a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2624,19 +2624,43 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src,
int node)
{
+ cpumask_t *user_mask;
unsigned long flags;

+ /*
+ * Always clear dst->user_cpus_ptr first as their user_cpus_ptr's
+ * may differ by now due to racing.
+ */
+ dst->user_cpus_ptr = NULL;
+
+ /*
+ * This check is racy and losing the race is a valid situation.
+ * It is not worth the extra overhead of taking the pi_lock on
+ * every fork/clone.
+ */
if (!src->user_cpus_ptr)
return 0;

- dst->user_cpus_ptr = kmalloc_node(cpumask_size(), GFP_KERNEL, node);
- if (!dst->user_cpus_ptr)
+ user_mask = kmalloc_node(cpumask_size(), GFP_KERNEL, node);
+ if (!user_mask)
return -ENOMEM;

- /* Use pi_lock to protect content of user_cpus_ptr */
+ /*
+ * Use pi_lock to protect content of user_cpus_ptr
+ *
+ * Though unlikely, user_cpus_ptr can be reset to NULL by a concurrent
+ * do_set_cpus_allowed().
+ */
raw_spin_lock_irqsave(&src->pi_lock, flags);
- cpumask_copy(dst->user_cpus_ptr, src->user_cpus_ptr);
+ if (src->user_cpus_ptr) {
+ swap(dst->user_cpus_ptr, user_mask);
+ cpumask_copy(dst->user_cpus_ptr, src->user_cpus_ptr);
+ }
raw_spin_unlock_irqrestore(&src->pi_lock, flags);
+
+ if (unlikely(user_mask))
+ kfree(user_mask);
+
return 0;
}

--
2.31.1


2022-11-28 02:39:57

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH-tip v4] sched: Fix NULL user_cpus_ptr check in dup_user_cpus_ptr()

On 11/24/22 21:39, Waiman Long wrote:
> In general, a non-null user_cpus_ptr will remain set until the task dies.
> A possible exception to this is the fact that do_set_cpus_allowed()
> will clear a non-null user_cpus_ptr. To allow this possible racing
> condition, we need to check for NULL user_cpus_ptr under the pi_lock
> before duping the user mask.
>
> Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
> Signed-off-by: Waiman Long <[email protected]>

This is actually a pre-existing use-after-free bug since commit
07ec77a1d4e8 ("sched: Allow task CPU affinity to be restricted on
asymmetric systems"). So it needs to be fixed in the stable release as
well. Will resend the patch with an additional fixes tag and updated
commit log.

Cheers,
Longman

> ---
> kernel/sched/core.c | 32 ++++++++++++++++++++++++++++----
> 1 file changed, 28 insertions(+), 4 deletions(-)
>
> [v4] Minor comment update
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 8df51b08bb38..f2b75faaf71a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2624,19 +2624,43 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
> int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src,
> int node)
> {
> + cpumask_t *user_mask;
> unsigned long flags;
>
> + /*
> + * Always clear dst->user_cpus_ptr first as their user_cpus_ptr's
> + * may differ by now due to racing.
> + */
> + dst->user_cpus_ptr = NULL;
> +
> + /*
> + * This check is racy and losing the race is a valid situation.
> + * It is not worth the extra overhead of taking the pi_lock on
> + * every fork/clone.
> + */
> if (!src->user_cpus_ptr)
> return 0;
>
> - dst->user_cpus_ptr = kmalloc_node(cpumask_size(), GFP_KERNEL, node);
> - if (!dst->user_cpus_ptr)
> + user_mask = kmalloc_node(cpumask_size(), GFP_KERNEL, node);
> + if (!user_mask)
> return -ENOMEM;
>
> - /* Use pi_lock to protect content of user_cpus_ptr */
> + /*
> + * Use pi_lock to protect content of user_cpus_ptr
> + *
> + * Though unlikely, user_cpus_ptr can be reset to NULL by a concurrent
> + * do_set_cpus_allowed().
> + */
> raw_spin_lock_irqsave(&src->pi_lock, flags);
> - cpumask_copy(dst->user_cpus_ptr, src->user_cpus_ptr);
> + if (src->user_cpus_ptr) {
> + swap(dst->user_cpus_ptr, user_mask);
> + cpumask_copy(dst->user_cpus_ptr, src->user_cpus_ptr);
> + }
> raw_spin_unlock_irqrestore(&src->pi_lock, flags);
> +
> + if (unlikely(user_mask))
> + kfree(user_mask);
> +
> return 0;
> }
>

2022-11-28 12:18:01

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH-tip v4] sched: Fix NULL user_cpus_ptr check in dup_user_cpus_ptr()

On Sun, Nov 27, 2022 at 08:43:27PM -0500, Waiman Long wrote:
> On 11/24/22 21:39, Waiman Long wrote:
> > In general, a non-null user_cpus_ptr will remain set until the task dies.
> > A possible exception to this is the fact that do_set_cpus_allowed()
> > will clear a non-null user_cpus_ptr. To allow this possible racing
> > condition, we need to check for NULL user_cpus_ptr under the pi_lock
> > before duping the user mask.
> >
> > Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
> > Signed-off-by: Waiman Long <[email protected]>
>
> This is actually a pre-existing use-after-free bug since commit 07ec77a1d4e8
> ("sched: Allow task CPU affinity to be restricted on asymmetric systems").
> So it needs to be fixed in the stable release as well. Will resend the patch
> with an additional fixes tag and updated commit log.

Please can you elaborate on the use-after-free here? Looking at
07ec77a1d4e8, the mask is only freed in free_task() when the usage refcount
has dropped to zero and I can't see how that can race with fork().

What am I missing?

Will

> > kernel/sched/core.c | 32 ++++++++++++++++++++++++++++----
> > 1 file changed, 28 insertions(+), 4 deletions(-)
> >
> > [v4] Minor comment update
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 8df51b08bb38..f2b75faaf71a 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2624,19 +2624,43 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
> > int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src,
> > int node)
> > {
> > + cpumask_t *user_mask;
> > unsigned long flags;
> > + /*
> > + * Always clear dst->user_cpus_ptr first as their user_cpus_ptr's
> > + * may differ by now due to racing.
> > + */
> > + dst->user_cpus_ptr = NULL;
> > +
> > + /*
> > + * This check is racy and losing the race is a valid situation.
> > + * It is not worth the extra overhead of taking the pi_lock on
> > + * every fork/clone.
> > + */
> > if (!src->user_cpus_ptr)
> > return 0;
> > - dst->user_cpus_ptr = kmalloc_node(cpumask_size(), GFP_KERNEL, node);
> > - if (!dst->user_cpus_ptr)
> > + user_mask = kmalloc_node(cpumask_size(), GFP_KERNEL, node);
> > + if (!user_mask)
> > return -ENOMEM;
> > - /* Use pi_lock to protect content of user_cpus_ptr */
> > + /*
> > + * Use pi_lock to protect content of user_cpus_ptr
> > + *
> > + * Though unlikely, user_cpus_ptr can be reset to NULL by a concurrent
> > + * do_set_cpus_allowed().
> > + */
> > raw_spin_lock_irqsave(&src->pi_lock, flags);
> > - cpumask_copy(dst->user_cpus_ptr, src->user_cpus_ptr);
> > + if (src->user_cpus_ptr) {
> > + swap(dst->user_cpus_ptr, user_mask);
> > + cpumask_copy(dst->user_cpus_ptr, src->user_cpus_ptr);
> > + }
> > raw_spin_unlock_irqrestore(&src->pi_lock, flags);
> > +
> > + if (unlikely(user_mask))
> > + kfree(user_mask);
> > +
> > return 0;
> > }
>

2022-11-28 15:38:56

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH-tip v4] sched: Fix NULL user_cpus_ptr check in dup_user_cpus_ptr()

On 11/28/22 07:00, Will Deacon wrote:
> On Sun, Nov 27, 2022 at 08:43:27PM -0500, Waiman Long wrote:
>> On 11/24/22 21:39, Waiman Long wrote:
>>> In general, a non-null user_cpus_ptr will remain set until the task dies.
>>> A possible exception to this is the fact that do_set_cpus_allowed()
>>> will clear a non-null user_cpus_ptr. To allow this possible racing
>>> condition, we need to check for NULL user_cpus_ptr under the pi_lock
>>> before duping the user mask.
>>>
>>> Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
>>> Signed-off-by: Waiman Long <[email protected]>
>> This is actually a pre-existing use-after-free bug since commit 07ec77a1d4e8
>> ("sched: Allow task CPU affinity to be restricted on asymmetric systems").
>> So it needs to be fixed in the stable release as well. Will resend the patch
>> with an additional fixes tag and updated commit log.
> Please can you elaborate on the use-after-free here? Looking at
> 07ec77a1d4e8, the mask is only freed in free_task() when the usage refcount
> has dropped to zero and I can't see how that can race with fork().
>
> What am I missing?

I missed that at first. The current task cloning process copies the
content of the task structure over to the newly cloned/forked task. IOW,
if user_cpus_ptr had been set up previously, it will be copied over to
the cloned task. Now if user_cpus_ptr of the source task is cleared
right after that and before dup_user_cpus_ptr() is called. The obsolete
user_cpus_ptr value in the cloned task will remain and get used even if
it has been freed. That is what I call as use-after-free and double-free.

Cheers,
Longman

2022-11-29 14:14:44

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH-tip v4] sched: Fix NULL user_cpus_ptr check in dup_user_cpus_ptr()

On Mon, Nov 28, 2022 at 10:11:52AM -0500, Waiman Long wrote:
> On 11/28/22 07:00, Will Deacon wrote:
> > On Sun, Nov 27, 2022 at 08:43:27PM -0500, Waiman Long wrote:
> > > On 11/24/22 21:39, Waiman Long wrote:
> > > > In general, a non-null user_cpus_ptr will remain set until the task dies.
> > > > A possible exception to this is the fact that do_set_cpus_allowed()
> > > > will clear a non-null user_cpus_ptr. To allow this possible racing
> > > > condition, we need to check for NULL user_cpus_ptr under the pi_lock
> > > > before duping the user mask.
> > > >
> > > > Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
> > > > Signed-off-by: Waiman Long <[email protected]>
> > > This is actually a pre-existing use-after-free bug since commit 07ec77a1d4e8
> > > ("sched: Allow task CPU affinity to be restricted on asymmetric systems").
> > > So it needs to be fixed in the stable release as well. Will resend the patch
> > > with an additional fixes tag and updated commit log.
> > Please can you elaborate on the use-after-free here? Looking at
> > 07ec77a1d4e8, the mask is only freed in free_task() when the usage refcount
> > has dropped to zero and I can't see how that can race with fork().
> >
> > What am I missing?
>
> I missed that at first. The current task cloning process copies the content
> of the task structure over to the newly cloned/forked task. IOW, if
> user_cpus_ptr had been set up previously, it will be copied over to the
> cloned task. Now if user_cpus_ptr of the source task is cleared right after
> that and before dup_user_cpus_ptr() is called. The obsolete user_cpus_ptr
> value in the cloned task will remain and get used even if it has been freed.
> That is what I call as use-after-free and double-free.

If the parent task can be modified concurrently with dup_task_struct() then
surely we'd have bigger issues because that's not going to be atomic? At the
very least we'd have a data race, but it also feels like we could end up
with inconsistent task state in the child. In fact, couldn't the normal
'cpus_mask' be corrupted by a concurrent set_cpus_allowed_common()?

Or am I still failing to understand the race?

Will

2022-11-29 15:57:52

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH-tip v4] sched: Fix NULL user_cpus_ptr check in dup_user_cpus_ptr()

On 11/29/22 09:07, Will Deacon wrote:
> On Mon, Nov 28, 2022 at 10:11:52AM -0500, Waiman Long wrote:
>> On 11/28/22 07:00, Will Deacon wrote:
>>> On Sun, Nov 27, 2022 at 08:43:27PM -0500, Waiman Long wrote:
>>>> On 11/24/22 21:39, Waiman Long wrote:
>>>>> In general, a non-null user_cpus_ptr will remain set until the task dies.
>>>>> A possible exception to this is the fact that do_set_cpus_allowed()
>>>>> will clear a non-null user_cpus_ptr. To allow this possible racing
>>>>> condition, we need to check for NULL user_cpus_ptr under the pi_lock
>>>>> before duping the user mask.
>>>>>
>>>>> Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
>>>>> Signed-off-by: Waiman Long <[email protected]>
>>>> This is actually a pre-existing use-after-free bug since commit 07ec77a1d4e8
>>>> ("sched: Allow task CPU affinity to be restricted on asymmetric systems").
>>>> So it needs to be fixed in the stable release as well. Will resend the patch
>>>> with an additional fixes tag and updated commit log.
>>> Please can you elaborate on the use-after-free here? Looking at
>>> 07ec77a1d4e8, the mask is only freed in free_task() when the usage refcount
>>> has dropped to zero and I can't see how that can race with fork().
>>>
>>> What am I missing?
>> I missed that at first. The current task cloning process copies the content
>> of the task structure over to the newly cloned/forked task. IOW, if
>> user_cpus_ptr had been set up previously, it will be copied over to the
>> cloned task. Now if user_cpus_ptr of the source task is cleared right after
>> that and before dup_user_cpus_ptr() is called. The obsolete user_cpus_ptr
>> value in the cloned task will remain and get used even if it has been freed.
>> That is what I call as use-after-free and double-free.
> If the parent task can be modified concurrently with dup_task_struct() then
> surely we'd have bigger issues because that's not going to be atomic? At the
> very least we'd have a data race, but it also feels like we could end up
> with inconsistent task state in the child. In fact, couldn't the normal
> 'cpus_mask' be corrupted by a concurrent set_cpus_allowed_common()?
>
> Or am I still failing to understand the race?
>
> Will

A major difference between cpus_mask and user_cpus_ptr is that for
cpus_mask, the bitmap is embedded into task_struct whereas user_cpus_ptr
is a pointer to an external bitmap. So there is no issue of
use-after-free wrt cpus_mask. That is not the case where the memory of
the user_cpus_ptr of the parent task is freed, but then a reference to
that memory is still available in the child's task struct and may be used.

Note that the problematic concurrence is not between the copying of task
struct and changing of the task struct. It is what will happen after the
task struct copying has already been done with an extra reference
present in the child's task struct.

Cheers,
Longman

2022-11-29 16:25:32

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH-tip v4] sched: Fix NULL user_cpus_ptr check in dup_user_cpus_ptr()

On Tue, Nov 29, 2022 at 10:32:49AM -0500, Waiman Long wrote:
> On 11/29/22 09:07, Will Deacon wrote:
> > On Mon, Nov 28, 2022 at 10:11:52AM -0500, Waiman Long wrote:
> > > On 11/28/22 07:00, Will Deacon wrote:
> > > > On Sun, Nov 27, 2022 at 08:43:27PM -0500, Waiman Long wrote:
> > > > > On 11/24/22 21:39, Waiman Long wrote:
> > > > > > In general, a non-null user_cpus_ptr will remain set until the task dies.
> > > > > > A possible exception to this is the fact that do_set_cpus_allowed()
> > > > > > will clear a non-null user_cpus_ptr. To allow this possible racing
> > > > > > condition, we need to check for NULL user_cpus_ptr under the pi_lock
> > > > > > before duping the user mask.
> > > > > >
> > > > > > Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
> > > > > > Signed-off-by: Waiman Long <[email protected]>
> > > > > This is actually a pre-existing use-after-free bug since commit 07ec77a1d4e8
> > > > > ("sched: Allow task CPU affinity to be restricted on asymmetric systems").
> > > > > So it needs to be fixed in the stable release as well. Will resend the patch
> > > > > with an additional fixes tag and updated commit log.
> > > > Please can you elaborate on the use-after-free here? Looking at
> > > > 07ec77a1d4e8, the mask is only freed in free_task() when the usage refcount
> > > > has dropped to zero and I can't see how that can race with fork().
> > > >
> > > > What am I missing?
> > > I missed that at first. The current task cloning process copies the content
> > > of the task structure over to the newly cloned/forked task. IOW, if
> > > user_cpus_ptr had been set up previously, it will be copied over to the
> > > cloned task. Now if user_cpus_ptr of the source task is cleared right after
> > > that and before dup_user_cpus_ptr() is called. The obsolete user_cpus_ptr
> > > value in the cloned task will remain and get used even if it has been freed.
> > > That is what I call as use-after-free and double-free.
> > If the parent task can be modified concurrently with dup_task_struct() then
> > surely we'd have bigger issues because that's not going to be atomic? At the
> > very least we'd have a data race, but it also feels like we could end up
> > with inconsistent task state in the child. In fact, couldn't the normal
> > 'cpus_mask' be corrupted by a concurrent set_cpus_allowed_common()?
> >
> > Or am I still failing to understand the race?
> >
> A major difference between cpus_mask and user_cpus_ptr is that for
> cpus_mask, the bitmap is embedded into task_struct whereas user_cpus_ptr is
> a pointer to an external bitmap. So there is no issue of use-after-free wrt
> cpus_mask. That is not the case where the memory of the user_cpus_ptr of the
> parent task is freed, but then a reference to that memory is still available
> in the child's task struct and may be used.

Sure, I'm not saying there's a UAF on cpus_mask, but I'm concerned that we
could corrupt the data and end up with an affinity mask that doesn't correspond
to anything meaningful. Do you agree that's possible?

> Note that the problematic concurrence is not between the copying of task
> struct and changing of the task struct. It is what will happen after the
> task struct copying has already been done with an extra reference present in
> the child's task struct.

Well, sort of, but the child only has the extra reference _because_ the parent
pointer was concurrently cleared to NULL, otherwise dup_user_cpus_ptr() would
have allocated a new copy and we'd be ok, no?

Overall, I'm just very wary that we seem to be saying that copy_process()
can run concurrently with changes to the parent. Maybe it's all been written
with that in mindi (including all the arch callbacks), but I'd be astonished
if this is the only problem in there.

Will

2022-11-29 16:40:43

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH-tip v4] sched: Fix NULL user_cpus_ptr check in dup_user_cpus_ptr()

On 11/29/22 10:57, Will Deacon wrote:
> On Tue, Nov 29, 2022 at 10:32:49AM -0500, Waiman Long wrote:
>> On 11/29/22 09:07, Will Deacon wrote:
>>> On Mon, Nov 28, 2022 at 10:11:52AM -0500, Waiman Long wrote:
>>>> On 11/28/22 07:00, Will Deacon wrote:
>>>>> On Sun, Nov 27, 2022 at 08:43:27PM -0500, Waiman Long wrote:
>>>>>> On 11/24/22 21:39, Waiman Long wrote:
>>>>>>> In general, a non-null user_cpus_ptr will remain set until the task dies.
>>>>>>> A possible exception to this is the fact that do_set_cpus_allowed()
>>>>>>> will clear a non-null user_cpus_ptr. To allow this possible racing
>>>>>>> condition, we need to check for NULL user_cpus_ptr under the pi_lock
>>>>>>> before duping the user mask.
>>>>>>>
>>>>>>> Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
>>>>>>> Signed-off-by: Waiman Long <[email protected]>
>>>>>> This is actually a pre-existing use-after-free bug since commit 07ec77a1d4e8
>>>>>> ("sched: Allow task CPU affinity to be restricted on asymmetric systems").
>>>>>> So it needs to be fixed in the stable release as well. Will resend the patch
>>>>>> with an additional fixes tag and updated commit log.
>>>>> Please can you elaborate on the use-after-free here? Looking at
>>>>> 07ec77a1d4e8, the mask is only freed in free_task() when the usage refcount
>>>>> has dropped to zero and I can't see how that can race with fork().
>>>>>
>>>>> What am I missing?
>>>> I missed that at first. The current task cloning process copies the content
>>>> of the task structure over to the newly cloned/forked task. IOW, if
>>>> user_cpus_ptr had been set up previously, it will be copied over to the
>>>> cloned task. Now if user_cpus_ptr of the source task is cleared right after
>>>> that and before dup_user_cpus_ptr() is called. The obsolete user_cpus_ptr
>>>> value in the cloned task will remain and get used even if it has been freed.
>>>> That is what I call as use-after-free and double-free.
>>> If the parent task can be modified concurrently with dup_task_struct() then
>>> surely we'd have bigger issues because that's not going to be atomic? At the
>>> very least we'd have a data race, but it also feels like we could end up
>>> with inconsistent task state in the child. In fact, couldn't the normal
>>> 'cpus_mask' be corrupted by a concurrent set_cpus_allowed_common()?
>>>
>>> Or am I still failing to understand the race?
>>>
>> A major difference between cpus_mask and user_cpus_ptr is that for
>> cpus_mask, the bitmap is embedded into task_struct whereas user_cpus_ptr is
>> a pointer to an external bitmap. So there is no issue of use-after-free wrt
>> cpus_mask. That is not the case where the memory of the user_cpus_ptr of the
>> parent task is freed, but then a reference to that memory is still available
>> in the child's task struct and may be used.
> Sure, I'm not saying there's a UAF on cpus_mask, but I'm concerned that we
> could corrupt the data and end up with an affinity mask that doesn't correspond
> to anything meaningful. Do you agree that's possible?
That is certainly possible. So we have to be careful about it.
>
>> Note that the problematic concurrence is not between the copying of task
>> struct and changing of the task struct. It is what will happen after the
>> task struct copying has already been done with an extra reference present in
>> the child's task struct.
> Well, sort of, but the child only has the extra reference _because_ the parent
> pointer was concurrently cleared to NULL, otherwise dup_user_cpus_ptr() would
> have allocated a new copy and we'd be ok, no?
Yes, that is exactly where the problem is and this is what my patch is
trying to fix.
>
> Overall, I'm just very wary that we seem to be saying that copy_process()
> can run concurrently with changes to the parent. Maybe it's all been written
> with that in mindi (including all the arch callbacks), but I'd be astonished
> if this is the only problem in there.

It seems like that, at least in some cases, the clearing of a task's
user_cpus_ptr can be done by another task. So the parent may be unaware
of it and so is not its fault.

Cheers,
Longman

2022-11-30 12:38:01

by David Wang 王标

[permalink] [raw]
Subject: 答复: [External Mail]Re: [PATCH-tip v4] sch ed: Fix NULL user_cpus_ptr check in dup_user_ cpus_ptr()

Dear Will, Longman,

Could we fix the issue first we met? We can analyze other issue later.

Thanks


-----邮件原件-----
发件人: Waiman Long <[email protected]>
发送时间: 2022年11月30日 0:04
收件人: Will Deacon <[email protected]>
抄送: Ingo Molnar <[email protected]>; Peter Zijlstra <[email protected]>; Juri Lelli <[email protected]>; Vincent Guittot <[email protected]>; Dietmar Eggemann <[email protected]>; Steven Rostedt <[email protected]>; Ben Segall <[email protected]>; Mel Gorman <[email protected]>; Daniel Bristot de Oliveira <[email protected]>; Phil Auld <[email protected]>; Wenjie Li <[email protected]>; David Wang 王标 <[email protected]>; [email protected]
主题: [External Mail]Re: [PATCH-tip v4] sched: Fix NULL user_cpus_ptr check in dup_user_cpus_ptr()

[外部邮件] 此邮件来源于小米公司外部,请谨慎处理。若对邮件安全性存疑,请将邮件转发给[email protected]进行反馈

On 11/29/22 10:57, Will Deacon wrote:
> On Tue, Nov 29, 2022 at 10:32:49AM -0500, Waiman Long wrote:
>> On 11/29/22 09:07, Will Deacon wrote:
>>> On Mon, Nov 28, 2022 at 10:11:52AM -0500, Waiman Long wrote:
>>>> On 11/28/22 07:00, Will Deacon wrote:
>>>>> On Sun, Nov 27, 2022 at 08:43:27PM -0500, Waiman Long wrote:
>>>>>> On 11/24/22 21:39, Waiman Long wrote:
>>>>>>> In general, a non-null user_cpus_ptr will remain set until the task dies.
>>>>>>> A possible exception to this is the fact that
>>>>>>> do_set_cpus_allowed() will clear a non-null user_cpus_ptr. To
>>>>>>> allow this possible racing condition, we need to check for NULL
>>>>>>> user_cpus_ptr under the pi_lock before duping the user mask.
>>>>>>>
>>>>>>> Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in
>>>>>>> do_set_cpus_allowed()")
>>>>>>> Signed-off-by: Waiman Long <[email protected]>
>>>>>> This is actually a pre-existing use-after-free bug since commit
>>>>>> 07ec77a1d4e8
>>>>>> ("sched: Allow task CPU affinity to be restricted on asymmetric systems").
>>>>>> So it needs to be fixed in the stable release as well. Will
>>>>>> resend the patch with an additional fixes tag and updated commit log.
>>>>> Please can you elaborate on the use-after-free here? Looking at
>>>>> 07ec77a1d4e8, the mask is only freed in free_task() when the usage
>>>>> refcount has dropped to zero and I can't see how that can race with fork().
>>>>>
>>>>> What am I missing?
>>>> I missed that at first. The current task cloning process copies the
>>>> content of the task structure over to the newly cloned/forked task.
>>>> IOW, if user_cpus_ptr had been set up previously, it will be copied
>>>> over to the cloned task. Now if user_cpus_ptr of the source task is
>>>> cleared right after that and before dup_user_cpus_ptr() is called.
>>>> The obsolete user_cpus_ptr value in the cloned task will remain and get used even if it has been freed.
>>>> That is what I call as use-after-free and double-free.
>>> If the parent task can be modified concurrently with
>>> dup_task_struct() then surely we'd have bigger issues because that's
>>> not going to be atomic? At the very least we'd have a data race, but
>>> it also feels like we could end up with inconsistent task state in
>>> the child. In fact, couldn't the normal 'cpus_mask' be corrupted by a concurrent set_cpus_allowed_common()?
>>>
>>> Or am I still failing to understand the race?
>>>
>> A major difference between cpus_mask and user_cpus_ptr is that for
>> cpus_mask, the bitmap is embedded into task_struct whereas
>> user_cpus_ptr is a pointer to an external bitmap. So there is no
>> issue of use-after-free wrt cpus_mask. That is not the case where the
>> memory of the user_cpus_ptr of the parent task is freed, but then a
>> reference to that memory is still available in the child's task struct and may be used.
> Sure, I'm not saying there's a UAF on cpus_mask, but I'm concerned
> that we could corrupt the data and end up with an affinity mask that
> doesn't correspond to anything meaningful. Do you agree that's possible?
That is certainly possible. So we have to be careful about it.
>
>> Note that the problematic concurrence is not between the copying of
>> task struct and changing of the task struct. It is what will happen
>> after the task struct copying has already been done with an extra
>> reference present in the child's task struct.
> Well, sort of, but the child only has the extra reference _because_
> the parent pointer was concurrently cleared to NULL, otherwise
> dup_user_cpus_ptr() would have allocated a new copy and we'd be ok, no?
Yes, that is exactly where the problem is and this is what my patch is trying to fix.
>
> Overall, I'm just very wary that we seem to be saying that
> copy_process() can run concurrently with changes to the parent. Maybe
> it's all been written with that in mindi (including all the arch
> callbacks), but I'd be astonished if this is the only problem in there.

It seems like that, at least in some cases, the clearing of a task's user_cpus_ptr can be done by another task. So the parent may be unaware of it and so is not its fault.

Cheers,
Longman

#/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#

2022-11-30 14:33:30

by Wenjie Li (Evan)

[permalink] [raw]
Subject: RE: [External Mail]Re: [PATCH-tip v4] sched: Fix NULL user_cpus_ptr check in dup_user_cpus_ptr()

Dear Will and Longman.

Maybe we can solve this scenario firstly if it dose not effect other architecture.

Thanks.

-----Original Message-----
From: David Wang 王标 <[email protected]>
Sent: Wednesday, November 30, 2022 7:52 PM
To: Waiman Long <[email protected]>; Will Deacon <[email protected]>
Cc: Ingo Molnar <[email protected]>; Peter Zijlstra <[email protected]>; Juri Lelli <[email protected]>; Vincent Guittot <[email protected]>; Dietmar Eggemann <[email protected]>; Steven Rostedt <[email protected]>; Ben Segall <[email protected]>; Mel Gorman <[email protected]>; Daniel Bristot de Oliveira <[email protected]>; Phil Auld <[email protected]>; [email protected]; Wenjie Li (QUIC) <[email protected]>; [email protected]; [email protected]; 刘添威 <[email protected]>; [email protected]; Ting11 Wang 王婷 <[email protected]>
Subject: 答复: [External Mail]Re: [PATCH-tip v4] sched: Fix NULL user_cpus_ptr check in dup_user_cpus_ptr()

WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.

Dear Will, Longman,

Could we fix the issue first we met? We can analyze other issue later.

Thanks


-----邮件原件-----
发件人: Waiman Long <[email protected]>
发送时间: 2022年11月30日 0:04
收件人: Will Deacon <[email protected]>
抄送: Ingo Molnar <[email protected]>; Peter Zijlstra <[email protected]>; Juri Lelli <[email protected]>; Vincent Guittot <[email protected]>; Dietmar Eggemann <[email protected]>; Steven Rostedt <[email protected]>; Ben Segall <[email protected]>; Mel Gorman <[email protected]>; Daniel Bristot de Oliveira <[email protected]>; Phil Auld <[email protected]>; Wenjie Li <[email protected]>; David Wang 王标 <[email protected]>; [email protected]
主题: [External Mail]Re: [PATCH-tip v4] sched: Fix NULL user_cpus_ptr check in dup_user_cpus_ptr()

[外部邮件] 此邮件来源于小米公司外部,请谨慎处理。若对邮件安全性存疑,请将邮件转发给[email protected]进行反馈

On 11/29/22 10:57, Will Deacon wrote:
> On Tue, Nov 29, 2022 at 10:32:49AM -0500, Waiman Long wrote:
>> On 11/29/22 09:07, Will Deacon wrote:
>>> On Mon, Nov 28, 2022 at 10:11:52AM -0500, Waiman Long wrote:
>>>> On 11/28/22 07:00, Will Deacon wrote:
>>>>> On Sun, Nov 27, 2022 at 08:43:27PM -0500, Waiman Long wrote:
>>>>>> On 11/24/22 21:39, Waiman Long wrote:
>>>>>>> In general, a non-null user_cpus_ptr will remain set until the task dies.
>>>>>>> A possible exception to this is the fact that
>>>>>>> do_set_cpus_allowed() will clear a non-null user_cpus_ptr. To
>>>>>>> allow this possible racing condition, we need to check for NULL
>>>>>>> user_cpus_ptr under the pi_lock before duping the user mask.
>>>>>>>
>>>>>>> Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in
>>>>>>> do_set_cpus_allowed()")
>>>>>>> Signed-off-by: Waiman Long <[email protected]>
>>>>>> This is actually a pre-existing use-after-free bug since commit
>>>>>> 07ec77a1d4e8
>>>>>> ("sched: Allow task CPU affinity to be restricted on asymmetric systems").
>>>>>> So it needs to be fixed in the stable release as well. Will
>>>>>> resend the patch with an additional fixes tag and updated commit log.
>>>>> Please can you elaborate on the use-after-free here? Looking at
>>>>> 07ec77a1d4e8, the mask is only freed in free_task() when the usage
>>>>> refcount has dropped to zero and I can't see how that can race with fork().
>>>>>
>>>>> What am I missing?
>>>> I missed that at first. The current task cloning process copies the
>>>> content of the task structure over to the newly cloned/forked task.
>>>> IOW, if user_cpus_ptr had been set up previously, it will be copied
>>>> over to the cloned task. Now if user_cpus_ptr of the source task is
>>>> cleared right after that and before dup_user_cpus_ptr() is called.
>>>> The obsolete user_cpus_ptr value in the cloned task will remain and get used even if it has been freed.
>>>> That is what I call as use-after-free and double-free.
>>> If the parent task can be modified concurrently with
>>> dup_task_struct() then surely we'd have bigger issues because that's
>>> not going to be atomic? At the very least we'd have a data race, but
>>> it also feels like we could end up with inconsistent task state in
>>> the child. In fact, couldn't the normal 'cpus_mask' be corrupted by a concurrent set_cpus_allowed_common()?
>>>
>>> Or am I still failing to understand the race?
>>>
>> A major difference between cpus_mask and user_cpus_ptr is that for
>> cpus_mask, the bitmap is embedded into task_struct whereas
>> user_cpus_ptr is a pointer to an external bitmap. So there is no
>> issue of use-after-free wrt cpus_mask. That is not the case where the
>> memory of the user_cpus_ptr of the parent task is freed, but then a
>> reference to that memory is still available in the child's task struct and may be used.
> Sure, I'm not saying there's a UAF on cpus_mask, but I'm concerned
> that we could corrupt the data and end up with an affinity mask that
> doesn't correspond to anything meaningful. Do you agree that's possible?
That is certainly possible. So we have to be careful about it.
>
>> Note that the problematic concurrence is not between the copying of
>> task struct and changing of the task struct. It is what will happen
>> after the task struct copying has already been done with an extra
>> reference present in the child's task struct.
> Well, sort of, but the child only has the extra reference _because_
> the parent pointer was concurrently cleared to NULL, otherwise
> dup_user_cpus_ptr() would have allocated a new copy and we'd be ok, no?
Yes, that is exactly where the problem is and this is what my patch is trying to fix.
>
> Overall, I'm just very wary that we seem to be saying that
> copy_process() can run concurrently with changes to the parent. Maybe
> it's all been written with that in mindi (including all the arch
> callbacks), but I'd be astonished if this is the only problem in there.

It seems like that, at least in some cases, the clearing of a task's user_cpus_ptr can be done by another task. So the parent may be unaware of it and so is not its fault.

Cheers,
Longman

#/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#

2022-12-01 14:05:32

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH-tip v4] sched: Fix NULL user_cpus_ptr check in dup_user_cpus_ptr()

On Tue, Nov 29, 2022 at 11:03:35AM -0500, Waiman Long wrote:
> On 11/29/22 10:57, Will Deacon wrote:
> > On Tue, Nov 29, 2022 at 10:32:49AM -0500, Waiman Long wrote:
> > > On 11/29/22 09:07, Will Deacon wrote:
> > > > On Mon, Nov 28, 2022 at 10:11:52AM -0500, Waiman Long wrote:
> > > > > On 11/28/22 07:00, Will Deacon wrote:
> > > > > > On Sun, Nov 27, 2022 at 08:43:27PM -0500, Waiman Long wrote:
> > > > > > > On 11/24/22 21:39, Waiman Long wrote:
> > > > > > > > In general, a non-null user_cpus_ptr will remain set until the task dies.
> > > > > > > > A possible exception to this is the fact that do_set_cpus_allowed()
> > > > > > > > will clear a non-null user_cpus_ptr. To allow this possible racing
> > > > > > > > condition, we need to check for NULL user_cpus_ptr under the pi_lock
> > > > > > > > before duping the user mask.
> > > > > > > >
> > > > > > > > Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
> > > > > > > > Signed-off-by: Waiman Long <[email protected]>
> > > > > > > This is actually a pre-existing use-after-free bug since commit 07ec77a1d4e8
> > > > > > > ("sched: Allow task CPU affinity to be restricted on asymmetric systems").
> > > > > > > So it needs to be fixed in the stable release as well. Will resend the patch
> > > > > > > with an additional fixes tag and updated commit log.
> > > > > > Please can you elaborate on the use-after-free here? Looking at
> > > > > > 07ec77a1d4e8, the mask is only freed in free_task() when the usage refcount
> > > > > > has dropped to zero and I can't see how that can race with fork().
> > > > > >
> > > > > > What am I missing?
> > > > > I missed that at first. The current task cloning process copies the content
> > > > > of the task structure over to the newly cloned/forked task. IOW, if
> > > > > user_cpus_ptr had been set up previously, it will be copied over to the
> > > > > cloned task. Now if user_cpus_ptr of the source task is cleared right after
> > > > > that and before dup_user_cpus_ptr() is called. The obsolete user_cpus_ptr
> > > > > value in the cloned task will remain and get used even if it has been freed.
> > > > > That is what I call as use-after-free and double-free.
> > > > If the parent task can be modified concurrently with dup_task_struct() then
> > > > surely we'd have bigger issues because that's not going to be atomic? At the
> > > > very least we'd have a data race, but it also feels like we could end up
> > > > with inconsistent task state in the child. In fact, couldn't the normal
> > > > 'cpus_mask' be corrupted by a concurrent set_cpus_allowed_common()?
> > > >
> > > > Or am I still failing to understand the race?
> > > >
> > > A major difference between cpus_mask and user_cpus_ptr is that for
> > > cpus_mask, the bitmap is embedded into task_struct whereas user_cpus_ptr is
> > > a pointer to an external bitmap. So there is no issue of use-after-free wrt
> > > cpus_mask. That is not the case where the memory of the user_cpus_ptr of the
> > > parent task is freed, but then a reference to that memory is still available
> > > in the child's task struct and may be used.
> > Sure, I'm not saying there's a UAF on cpus_mask, but I'm concerned that we
> > could corrupt the data and end up with an affinity mask that doesn't correspond
> > to anything meaningful. Do you agree that's possible?
> That is certainly possible. So we have to be careful about it.

Hmm, but we're not being particularly careful, are we? I hacked memcpy()
to be byte-to-byte to make things a bit easier to reproduce, and sure enough
I can race a bog-standard sched_setaffinity() call w/ fork():

[ 1663.935258] BUG: KCSAN: data-race in arch_dup_task_struct+0x4c/0x224
[ 1663.936872]
[ 1663.937292] race at unknown origin, with read to 0xffff06a44b8880a9 of 1 bytes by task 351 on cpu 0:
[ 1663.938770] arch_dup_task_struct+0x4c/0x224
[ 1663.939621] dup_task_struct+0x68/0x2a8
[ 1663.940381] copy_process+0x208/0x1404
[ 1663.941109] kernel_clone+0xdc/0x2c8
[ 1663.941814] __arm64_sys_clone+0x9c/0xd4
[ 1663.942909] invoke_syscall+0x54/0x170
[ 1663.943816] el0_svc_common+0x100/0x148
[ 1663.944607] do_el0_svc+0x40/0x10c
[ 1663.945333] el0_svc+0x2c/0x7c
[ 1663.946006] el0t_64_sync_handler+0x84/0xf0
[ 1663.946804] el0t_64_sync+0x18c/0x190

I then managed to get the child process to run with an affinity mask (i.e.
'task_struct::cpus_mask') of *zero*, which triggers the select_fallback_rq()
logic:

| process 14622 (waiman) no longer affine to cpu0

So sure, it's not a UAF, but I still think it's an issue that should be
fixed.

Will