2022-01-21 13:16:36

by Tadeusz Struk

[permalink] [raw]
Subject: [PATCH] sched/fair: Fix fault in reweight_entity

Syzbot found a GPF in reweight_entity. This has been bisected to commit
c85c6fadbef0 ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
Looks like after this change there is a time window, when
task_struct->se.cfs_rq can be NULL. This can be exploited to trigger
null-ptr-deref by calling setpriority on that task.

Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ben Segall <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Daniel Bristot de Oliveira <[email protected]>
Cc: Zhang Qiao <[email protected]>
Cc: [email protected]
Cc: [email protected]

Link: https://syzkaller.appspot.com/bug?id=9d9c27adc674e3a7932b22b61c79a02da82cbdc1
Fixes: c85c6fadbef0 ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
Signed-off-by: Tadeusz Struk <[email protected]>
---
kernel/sched/fair.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 095b0aa378df..196f8cee3f9b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3042,6 +3042,9 @@ dequeue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) { }
static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
unsigned long weight)
{
+ if (!cfs_rq)
+ return;
+
if (se->on_rq) {
/* commit outstanding execution time */
if (cfs_rq->curr == se)
--
2.34.1


2022-01-21 19:02:29

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix fault in reweight_entity

On Wed, 19 Jan 2022 at 02:24, Tadeusz Struk <[email protected]> wrote:
>
> Syzbot found a GPF in reweight_entity. This has been bisected to commit
> c85c6fadbef0 ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
> Looks like after this change there is a time window, when
> task_struct->se.cfs_rq can be NULL. This can be exploited to trigger
> null-ptr-deref by calling setpriority on that task.
>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Juri Lelli <[email protected]>
> Cc: Vincent Guittot <[email protected]>
> Cc: Dietmar Eggemann <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Ben Segall <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Daniel Bristot de Oliveira <[email protected]>
> Cc: Zhang Qiao <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
>
> Link: https://syzkaller.appspot.com/bug?id=9d9c27adc674e3a7932b22b61c79a02da82cbdc1
> Fixes: c85c6fadbef0 ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")

The sha1 doesn't look correct.

> Signed-off-by: Tadeusz Struk <[email protected]>
> ---
> kernel/sched/fair.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 095b0aa378df..196f8cee3f9b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3042,6 +3042,9 @@ dequeue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) { }
> static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
> unsigned long weight)
> {
> + if (!cfs_rq)
> + return;
> +
> if (se->on_rq) {
> /* commit outstanding execution time */
> if (cfs_rq->curr == se)
> --
> 2.34.1
>

2022-01-21 19:04:26

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix fault in reweight_entity

On Tue, Jan 18, 2022 at 05:24:17PM -0800, Tadeusz Struk wrote:
> Syzbot found a GPF in reweight_entity. This has been bisected to commit
> c85c6fadbef0 ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")

That's a stable commit, the real commit is 4ef0c5c6b5ba1f38f0ea1cedad0cad722f00c14a

> Looks like after this change there is a time window, when
> task_struct->se.cfs_rq can be NULL. This can be exploited to trigger
> null-ptr-deref by calling setpriority on that task.

Looks like isn't good enough, either there is, in which case you explain
the window, or there isn't in which case what are we doing here?

2022-01-21 19:39:01

by Tadeusz Struk

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix fault in reweight_entity

On 1/19/22 01:32, Peter Zijlstra wrote:
> On Tue, Jan 18, 2022 at 05:24:17PM -0800, Tadeusz Struk wrote:
>> Syzbot found a GPF in reweight_entity. This has been bisected to commit
>> c85c6fadbef0 ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
> That's a stable commit, the real commit is 4ef0c5c6b5ba1f38f0ea1cedad0cad722f00c14a

This is what syzbot bisected it to. I will reference the original commit in the
next version.

>
>> Looks like after this change there is a time window, when
>> task_struct->se.cfs_rq can be NULL. This can be exploited to trigger
>> null-ptr-deref by calling setpriority on that task.
> Looks like isn't good enough, either there is, in which case you explain
> the window, or there isn't in which case what are we doing here?

There surely is something wrong, otherwise it wouldn't crash.
I will try to narrow down the reproducer to better understand what causes
the fault.

--
Thanks,
Tadeusz

2022-01-21 21:01:34

by Tadeusz Struk

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix fault in reweight_entity

On 1/19/22 07:43, Tadeusz Struk wrote:
>>> Looks like after this change there is a time window, when
>>> task_struct->se.cfs_rq can be NULL. This can be exploited to trigger
>>> null-ptr-deref by calling setpriority on that task.
>> Looks like isn't good enough, either there is, in which case you explain
>> the window, or there isn't in which case what are we doing here?
>
> There surely is something wrong, otherwise it wouldn't crash.
> I will try to narrow down the reproducer to better understand what causes
> the fault.

The race is between sched_post_fork() and setpriority(PRIO_PGRP)
The scenario is that the main process spawns 3 new threads,
which then call setpriority(PRIO_PGRP, 0, -20), wait, and exit.
For each of the new thread the copy_process() gets invoked,
which then calls sched_fork() and finally sched_post_fork().

There is a possibility that setpriority(PRIO_PGRP)->set_one_prio() will be
called for a thread in the group that is just being created by copy_process(),
and for which the sched_post_fork() has not been executed yet.
This will trigger a null pointer dereference in reweight_entity()
because it will try to access the CFS run queue pointer, which hasn't been set,
resulting it a crash as below:

KASAN: null-ptr-deref in range [0x00000000000000a0-0x00000000000000a7]
CPU: 0 PID: 2392 Comm: reduced_repro Not tainted
5.16.0-11201-gb42c5a161ea3-dirty #13
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
RIP: 0010:reweight_entity+0x15d/0x440
RSP: 0018:ffffc900035dfcf8 EFLAGS: 00010006
Call Trace:
<TASK>
reweight_task+0xde/0x1c0
set_load_weight+0x21c/0x2b0
set_user_nice.part.0+0x2d1/0x519
set_user_nice.cold+0x8/0xd
set_one_prio+0x24f/0x263
__do_sys_setpriority+0x2d3/0x640
__x64_sys_setpriority+0x84/0x8b
do_syscall_64+0x35/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
</TASK>
---[ end trace 9dc80a9d378ed00a ]---

Before the mentioned change the rq pointer has been set in sched_fork(),
which is called much earlier in copy_process() as opposed to sched_post_fork(),
before the new task is added to the thread_group.

A stripped down version of the sysbot reproducer can be found here:
https://termbin.com/axkq

I can consistently reproduce the issue with it in 2-3 runs.

The solution is either we set the pointer p->se.cfs_rq to a dummy rq in
sched_fork(), or return from the set_one_prio() without doing anything
if the rq is NULL, as it is done in the patch.
I will update the description and resend it tomorrow.

--
Thanks,
Tadeusz