2020-08-01 02:34:14

by Jiang Biao

[permalink] [raw]
Subject: [PATCH] sched/fair: reduce preemption with IDLE tasks runable

From: Jiang Biao <[email protected]>

No need to preempt when there are only one runable CFS task with
other IDLE tasks on runqueue. The only one CFS task would always
be picked in that case.

Signed-off-by: Jiang Biao <[email protected]>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 04fa8dbcfa4d..8fb80636b010 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4527,7 +4527,7 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
return;
#endif

- if (cfs_rq->nr_running > 1)
+ if (cfs_rq->nr_running > cfs_rq.idle_h_nr_running + 1)
check_preempt_tick(cfs_rq, curr);
}

--
2.21.0


2020-08-03 08:20:36

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: reduce preemption with IDLE tasks runable

On 01/08/2020 04:32, Jiang Biao wrote:
> From: Jiang Biao <[email protected]>
>
> No need to preempt when there are only one runable CFS task with
> other IDLE tasks on runqueue. The only one CFS task would always
> be picked in that case.
>
> Signed-off-by: Jiang Biao <[email protected]>
> ---
> kernel/sched/fair.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 04fa8dbcfa4d..8fb80636b010 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4527,7 +4527,7 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
> return;
> #endif
>
> - if (cfs_rq->nr_running > 1)
> + if (cfs_rq->nr_running > cfs_rq.idle_h_nr_running + 1)

cfs_rq is a pointer.

> check_preempt_tick(cfs_rq, curr);
> }

You can't compare cfs_rq->nr_running with cfs_rq->idle_h_nr_running!

There is a difference between cfs_rq->h_nr_running and
cfs_rq->nr_running. The '_h_' stands for hierarchical.

The former gives you hierarchical task accounting whereas the latter is
the number of sched entities (representing tasks or taskgroups) enqueued
in cfs_rq.

In entity_tick(), cfs_rq->nr_running has to be used for the condition to
call check_preempt_tick(). We want to check if curr has to be preempted
by __pick_first_entity(cfs_rq) on this cfs_rq.

entity_tick() is called for each sched entity (and so for each
cfs_rq_of(se)) of the task group hierarchy (e.g. task p running in
taskgroup /A/B : se(p) -> se(A/B) -> se(A)).

2020-08-03 11:28:15

by benbjiang(蒋彪)

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: reduce preemption with IDLE tasks runable(Internet mail)



> On Aug 3, 2020, at 4:16 PM, Dietmar Eggemann <[email protected]> wrote:
>
> On 01/08/2020 04:32, Jiang Biao wrote:
>> From: Jiang Biao <[email protected]>
>>
>> No need to preempt when there are only one runable CFS task with
>> other IDLE tasks on runqueue. The only one CFS task would always
>> be picked in that case.
>>
>> Signed-off-by: Jiang Biao <[email protected]>
>> ---
>> kernel/sched/fair.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 04fa8dbcfa4d..8fb80636b010 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4527,7 +4527,7 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>> return;
>> #endif
>>
>> - if (cfs_rq->nr_running > 1)
>> + if (cfs_rq->nr_running > cfs_rq.idle_h_nr_running + 1)
>
> cfs_rq is a pointer.
It is. Sorry about that. :)

>
>> check_preempt_tick(cfs_rq, curr);
>> }
>
> You can't compare cfs_rq->nr_running with cfs_rq->idle_h_nr_running!
>
> There is a difference between cfs_rq->h_nr_running and
> cfs_rq->nr_running. The '_h_' stands for hierarchical.
>
> The former gives you hierarchical task accounting whereas the latter is
> the number of sched entities (representing tasks or taskgroups) enqueued
> in cfs_rq.
>
> In entity_tick(), cfs_rq->nr_running has to be used for the condition to
> call check_preempt_tick(). We want to check if curr has to be preempted
> by __pick_first_entity(cfs_rq) on this cfs_rq.
>
> entity_tick() is called for each sched entity (and so for each
> cfs_rq_of(se)) of the task group hierarchy (e.g. task p running in
> taskgroup /A/B : se(p) -> se(A/B) -> se(A)).
That’s true. I was thinking adding a new cfs_rq->idle_nr_running member to
track the per cfs_rq's IDLE task number, and reducing preemption here based
on that.
I’m not sure if it’s ok to do that, because the IDLE class seems not to be so
pure that could tolerate starving.
We need an absolutely low priority class that could tolerate starving, which
could be used to co-locate offline tasks. But IDLE class seems to be not
*low* enough, if considering the fairness of CFS, and IDLE class still has a
weight.

Thanks for you reply.

Regards,
Jiang

2020-08-06 17:23:40

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: reduce preemption with IDLE tasks runable(Internet mail)

On 03/08/2020 13:26, benbjiang(蒋彪) wrote:
>
>
>> On Aug 3, 2020, at 4:16 PM, Dietmar Eggemann <[email protected]> wrote:
>>
>> On 01/08/2020 04:32, Jiang Biao wrote:
>>> From: Jiang Biao <[email protected]>
>>>
>>> No need to preempt when there are only one runable CFS task with
>>> other IDLE tasks on runqueue. The only one CFS task would always
>>> be picked in that case.
>>>
>>> Signed-off-by: Jiang Biao <[email protected]>
>>> ---
>>> kernel/sched/fair.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 04fa8dbcfa4d..8fb80636b010 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -4527,7 +4527,7 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>>> return;
>>> #endif
>>>
>>> - if (cfs_rq->nr_running > 1)
>>> + if (cfs_rq->nr_running > cfs_rq.idle_h_nr_running + 1)
>>
>> cfs_rq is a pointer.
> It is. Sorry about that. :)
>
>>
>>> check_preempt_tick(cfs_rq, curr);
>>> }
>>
>> You can't compare cfs_rq->nr_running with cfs_rq->idle_h_nr_running!
>>
>> There is a difference between cfs_rq->h_nr_running and
>> cfs_rq->nr_running. The '_h_' stands for hierarchical.
>>
>> The former gives you hierarchical task accounting whereas the latter is
>> the number of sched entities (representing tasks or taskgroups) enqueued
>> in cfs_rq.
>>
>> In entity_tick(), cfs_rq->nr_running has to be used for the condition to
>> call check_preempt_tick(). We want to check if curr has to be preempted
>> by __pick_first_entity(cfs_rq) on this cfs_rq.
>>
>> entity_tick() is called for each sched entity (and so for each
>> cfs_rq_of(se)) of the task group hierarchy (e.g. task p running in
>> taskgroup /A/B : se(p) -> se(A/B) -> se(A)).
> That’s true. I was thinking adding a new cfs_rq->idle_nr_running member to
> track the per cfs_rq's IDLE task number, and reducing preemption here based
> on that.

How would you deal with se's representing taskgroups which contain
SCHED_IDLE and SCHED_NORMAL tasks or other taskgroups doing that?

> I’m not sure if it’s ok to do that, because the IDLE class seems not to be so
> pure that could tolerate starving.

Not sure I understand but idle_sched_class is not the same as SCHED_IDLE
(policy)?

> We need an absolutely low priority class that could tolerate starving, which
> could be used to co-locate offline tasks. But IDLE class seems to be not
> *low* enough, if considering the fairness of CFS, and IDLE class still has a
> weight.

[...]