2020-11-17 23:25:33

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH -tip 03/32] sched/fair: Fix pick_task_fair crashes due to empty rbtree

From: Peter Zijlstra <[email protected]>

pick_next_entity() is passed curr == NULL during core-scheduling. Due to
this, if the rbtree is empty, the 'left' variable is set to NULL within
the function. This can cause crashes within the function.

This is not an issue if put_prev_task() is invoked on the currently
running task before calling pick_next_entity(). However, in core
scheduling, it is possible that a sibling CPU picks for another RQ in
the core, via pick_task_fair(). This remote sibling would not get any
opportunities to do a put_prev_task().

Fix it by refactoring pick_task_fair() such that pick_next_entity() is
called with the cfs_rq->curr. This will prevent pick_next_entity() from
crashing if its rbtree is empty.

Also this fixes another possible bug where update_curr() would not be
called on the cfs_rq hierarchy if the rbtree is empty. This could effect
cross-cpu comparison of vruntime.

Suggested-by: Vineeth Remanan Pillai <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
kernel/sched/fair.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 12cf068eeec8..51483a00a755 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7029,15 +7029,10 @@ static struct task_struct *pick_task_fair(struct rq *rq)
do {
struct sched_entity *curr = cfs_rq->curr;

- se = pick_next_entity(cfs_rq, NULL);
-
- if (curr) {
- if (se && curr->on_rq)
- update_curr(cfs_rq);
+ if (curr && curr->on_rq)
+ update_curr(cfs_rq);

- if (!se || entity_before(curr, se))
- se = curr;
- }
+ se = pick_next_entity(cfs_rq, curr);

cfs_rq = group_cfs_rq(se);
} while (cfs_rq);
--
2.29.2.299.gdc1121823c-goog


2020-11-20 10:18:15

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH -tip 03/32] sched/fair: Fix pick_task_fair crashes due to empty rbtree

On 18/11/20 10:19 am, Joel Fernandes (Google) wrote:
> From: Peter Zijlstra <[email protected]>
>
> pick_next_entity() is passed curr == NULL during core-scheduling. Due to
> this, if the rbtree is empty, the 'left' variable is set to NULL within
> the function. This can cause crashes within the function.
>
> This is not an issue if put_prev_task() is invoked on the currently
> running task before calling pick_next_entity(). However, in core
> scheduling, it is possible that a sibling CPU picks for another RQ in
> the core, via pick_task_fair(). This remote sibling would not get any
> opportunities to do a put_prev_task().
>
> Fix it by refactoring pick_task_fair() such that pick_next_entity() is
> called with the cfs_rq->curr. This will prevent pick_next_entity() from
> crashing if its rbtree is empty.
>
> Also this fixes another possible bug where update_curr() would not be
> called on the cfs_rq hierarchy if the rbtree is empty. This could effect
> cross-cpu comparison of vruntime.
>

It is not clear from the changelog as to what does put_prev_task() do to prevent
the crash from occuring? Why did we pass NULL as curr in the first place to
pick_next_entity?

The patch looks functionally correct as in, it passes curr as the reference
to pick_next_entity() for left and entity_before comparisons.

Balbir Singh

2020-11-20 18:15:27

by Vineeth Pillai

[permalink] [raw]
Subject: Re: [PATCH -tip 03/32] sched/fair: Fix pick_task_fair crashes due to empty rbtree

Hi Balbir,

On 11/20/20 5:15 AM, Singh, Balbir wrote:
> On 18/11/20 10:19 am, Joel Fernandes (Google) wrote:
>> From: Peter Zijlstra <[email protected]>
>>
>> pick_next_entity() is passed curr == NULL during core-scheduling. Due to
>> this, if the rbtree is empty, the 'left' variable is set to NULL within
>> the function. This can cause crashes within the function.
>>
>> This is not an issue if put_prev_task() is invoked on the currently
>> running task before calling pick_next_entity(). However, in core
>> scheduling, it is possible that a sibling CPU picks for another RQ in
>> the core, via pick_task_fair(). This remote sibling would not get any
>> opportunities to do a put_prev_task().
>>
>> Fix it by refactoring pick_task_fair() such that pick_next_entity() is
>> called with the cfs_rq->curr. This will prevent pick_next_entity() from
>> crashing if its rbtree is empty.
>>
>> Also this fixes another possible bug where update_curr() would not be
>> called on the cfs_rq hierarchy if the rbtree is empty. This could effect
>> cross-cpu comparison of vruntime.
>>
> It is not clear from the changelog as to what does put_prev_task() do to prevent
> the crash from occuring? Why did we pass NULL as curr in the first place to
> pick_next_entity?
A little more context on this crash in v8 is here:
https://lwn.net/ml/linux-kernel/[email protected]/

The issue here arises from the fact that, we try to pick task for a
sibling while sibling is running a task. Running tasks are not in the
cfs_rq and pick_next_entity can return NULL if there is only one cfs
task in the cfs_rq. This would not happen normally because
put_prev_task is called before pick_task and put_prev_task adds the
task back to cfs_rq. But for coresched, pick_task is called on a
remote sibling's cfs_rq without calling put_prev_task and this can
lead to pick_next_entity returning NULL.

The initial logic of passing NULL would work fine as long as we call
put_prev_task before calling pick_task_fair. But for coresched, we
call pick_task_fair on siblings while the task is running and would
not be able to call put_prev_task. So this refactor of the code fixes
the crash by explicitly passing curr.

Hope this clarifies..

Thanks,
Vineeth

2020-11-23 22:34:34

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH -tip 03/32] sched/fair: Fix pick_task_fair crashes due to empty rbtree

On Fri, Nov 20, 2020 at 01:11:06PM -0500, Vineeth Pillai wrote:
> Hi Balbir,
>
> On 11/20/20 5:15 AM, Singh, Balbir wrote:
> > On 18/11/20 10:19 am, Joel Fernandes (Google) wrote:
> > > From: Peter Zijlstra <[email protected]>
> > >
> > > pick_next_entity() is passed curr == NULL during core-scheduling. Due to
> > > this, if the rbtree is empty, the 'left' variable is set to NULL within
> > > the function. This can cause crashes within the function.
> > >
> > > This is not an issue if put_prev_task() is invoked on the currently
> > > running task before calling pick_next_entity(). However, in core
> > > scheduling, it is possible that a sibling CPU picks for another RQ in
> > > the core, via pick_task_fair(). This remote sibling would not get any
> > > opportunities to do a put_prev_task().
> > >
> > > Fix it by refactoring pick_task_fair() such that pick_next_entity() is
> > > called with the cfs_rq->curr. This will prevent pick_next_entity() from
> > > crashing if its rbtree is empty.
> > >
> > > Also this fixes another possible bug where update_curr() would not be
> > > called on the cfs_rq hierarchy if the rbtree is empty. This could effect
> > > cross-cpu comparison of vruntime.
> > >
> > It is not clear from the changelog as to what does put_prev_task() do to prevent
> > the crash from occuring? Why did we pass NULL as curr in the first place to
> > pick_next_entity?
> A little more context on this crash in v8 is here:
> https://lwn.net/ml/linux-kernel/[email protected]/
>
> The issue here arises from the fact that, we try to pick task for a
> sibling while sibling is running a task. Running tasks are not in the
> cfs_rq and pick_next_entity can return NULL if there is only one cfs
> task in the cfs_rq. This would not happen normally because
> put_prev_task is called before pick_task and put_prev_task adds the
> task back to cfs_rq. But for coresched, pick_task is called on a
> remote sibling's cfs_rq without calling put_prev_task and this can
> lead to pick_next_entity returning NULL.
>
> The initial logic of passing NULL would work fine as long as we call
> put_prev_task before calling pick_task_fair. But for coresched, we
> call pick_task_fair on siblings while the task is running and would
> not be able to call put_prev_task. So this refactor of the code fixes
> the crash by explicitly passing curr.
>
> Hope this clarifies..
>

Yes, it does!

Thanks,
Balbir Singh.

2020-11-24 08:36:12

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH -tip 03/32] sched/fair: Fix pick_task_fair crashes due to empty rbtree

On Fri, Nov 20, 2020 at 09:15:28PM +1100, Singh, Balbir wrote:
> On 18/11/20 10:19 am, Joel Fernandes (Google) wrote:
> > From: Peter Zijlstra <[email protected]>
> >
> > pick_next_entity() is passed curr == NULL during core-scheduling. Due to
> > this, if the rbtree is empty, the 'left' variable is set to NULL within
> > the function. This can cause crashes within the function.
> >
> > This is not an issue if put_prev_task() is invoked on the currently
> > running task before calling pick_next_entity(). However, in core
> > scheduling, it is possible that a sibling CPU picks for another RQ in
> > the core, via pick_task_fair(). This remote sibling would not get any
> > opportunities to do a put_prev_task().
> >
> > Fix it by refactoring pick_task_fair() such that pick_next_entity() is
> > called with the cfs_rq->curr. This will prevent pick_next_entity() from
> > crashing if its rbtree is empty.
> >
> > Also this fixes another possible bug where update_curr() would not be
> > called on the cfs_rq hierarchy if the rbtree is empty. This could effect
> > cross-cpu comparison of vruntime.
> >
>
> It is not clear from the changelog as to what does put_prev_task() do to prevent
> the crash from occuring? Why did we pass NULL as curr in the first place to
> pick_next_entity?
>
> The patch looks functionally correct as in, it passes curr as the reference
> to pick_next_entity() for left and entity_before comparisons.

This patch should not exist; it should be smashed into the previous
patch. There is no point in preserving a crash.