The same reason as the previous commit, if we don't reset the
sched_avg last_update_time to 0, after a while in switched_to_fair():
switched_to_fair
attach_task_cfs_rq
attach_entity_cfs_rq
update_load_avg
__update_load_avg_se(now, cfs_rq, se)
The delta (now - sa->last_update_time) will wrongly contribute/decay
sched_avg depends on the task running/runnable status at that time.
This patch reset it's sched_avg last_update_time to 0, stop load
tracking for !fair task, later in switched_to_fair() ->
update_load_avg(), we can use its saved sched_avg.
Signed-off-by: Chengming Zhou <[email protected]>
---
kernel/sched/fair.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 50f65a2ede32..576028f5a09e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11552,6 +11552,11 @@ static void attach_task_cfs_rq(struct task_struct *p)
static void switched_from_fair(struct rq *rq, struct task_struct *p)
{
detach_task_cfs_rq(p);
+
+#ifdef CONFIG_SMP
+ /* Stop load tracking for !fair task */
+ p->se.avg.last_update_time = 0;
+#endif
}
static void switched_to_fair(struct rq *rq, struct task_struct *p)
--
2.36.1
On 13/07/2022 06:04, Chengming Zhou wrote:
> The same reason as the previous commit, if we don't reset the
> sched_avg last_update_time to 0, after a while in switched_to_fair():
>
> switched_to_fair
> attach_task_cfs_rq
> attach_entity_cfs_rq
> update_load_avg
> __update_load_avg_se(now, cfs_rq, se)
>
> The delta (now - sa->last_update_time) will wrongly contribute/decay
> sched_avg depends on the task running/runnable status at that time.
IMHO, a queued !fair task when switching back to fair will already be
enqueued (attached) as a fair task in __sched_setscheduler() prior to
the check_class_changed() call.
I can't see how this will work with your proposed change in using
last_update_time=0 for fair->!fair->fair class changes?
> This patch reset it's sched_avg last_update_time to 0, stop load
> tracking for !fair task, later in switched_to_fair() ->
> update_load_avg(), we can use its saved sched_avg.
>
> Signed-off-by: Chengming Zhou <[email protected]>
> ---
> kernel/sched/fair.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 50f65a2ede32..576028f5a09e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -11552,6 +11552,11 @@ static void attach_task_cfs_rq(struct task_struct *p)
> static void switched_from_fair(struct rq *rq, struct task_struct *p)
> {
> detach_task_cfs_rq(p);
> +
> +#ifdef CONFIG_SMP
> + /* Stop load tracking for !fair task */
> + p->se.avg.last_update_time = 0;
> +#endif
> }
>
> static void switched_to_fair(struct rq *rq, struct task_struct *p)
On 2022/7/14 20:33, Dietmar Eggemann wrote:
> On 13/07/2022 06:04, Chengming Zhou wrote:
>> The same reason as the previous commit, if we don't reset the
>> sched_avg last_update_time to 0, after a while in switched_to_fair():
>>
>> switched_to_fair
>> attach_task_cfs_rq
>> attach_entity_cfs_rq
>> update_load_avg
>> __update_load_avg_se(now, cfs_rq, se)
>>
>> The delta (now - sa->last_update_time) will wrongly contribute/decay
>> sched_avg depends on the task running/runnable status at that time.
>
> IMHO, a queued !fair task when switching back to fair will already be
> enqueued (attached) as a fair task in __sched_setscheduler() prior to
> the check_class_changed() call.
Right, this is true for a queued !fair task, it will enqueued (attached) before
check_class_changed().
enqueue_task_fair()
enqueue_entity()
update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH)
if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD)) --> true
__update_load_avg_se(now, cfs_rq, se) --> (1)
check_class_changed()
switched_to_fair()
attach_task_cfs_rq()
attach_entity_cfs_rq()
update_load_avg(cfs_rq, se, sched_feat(ATTACH_AGE_LOAD) ? 0 : SKIP_AGE_LOAD)
if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD)) --> true
__update_load_avg_se(now, cfs_rq, se) --> (2)
1. for queued !fair: (1) delta = (now - last_update_time), last_update_time is the time
when switched_from_fair().
2. for !queued !fair: (2) delta = (now - last_update_time), last_update_time is the time
when switched_from_fair().
The scenario in the commit message only cover !queued !fair case, I forget the queued !fair
case, their problem is the same.
>
> I can't see how this will work with your proposed change in using
> last_update_time=0 for fair->!fair->fair class changes?
If we reset last_update_time=0 for !fair task, then:
1. for queued !fair: will not do (1) since the if condition is false.
2. for !queued !fair: will not do (2) since the if condition is false.
Thanks.
>
>> This patch reset it's sched_avg last_update_time to 0, stop load
>> tracking for !fair task, later in switched_to_fair() ->
>> update_load_avg(), we can use its saved sched_avg.
>>
>> Signed-off-by: Chengming Zhou <[email protected]>
>> ---
>> kernel/sched/fair.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 50f65a2ede32..576028f5a09e 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -11552,6 +11552,11 @@ static void attach_task_cfs_rq(struct task_struct *p)
>> static void switched_from_fair(struct rq *rq, struct task_struct *p)
>> {
>> detach_task_cfs_rq(p);
>> +
>> +#ifdef CONFIG_SMP
>> + /* Stop load tracking for !fair task */
>> + p->se.avg.last_update_time = 0;
>> +#endif
>> }
>>
>> static void switched_to_fair(struct rq *rq, struct task_struct *p)
>
On 14/07/2022 15:43, Chengming Zhou wrote:
> On 2022/7/14 20:33, Dietmar Eggemann wrote:
>> On 13/07/2022 06:04, Chengming Zhou wrote:
[...]
>> IMHO, a queued !fair task when switching back to fair will already be
>> enqueued (attached) as a fair task in __sched_setscheduler() prior to
>> the check_class_changed() call.
>
> Right, this is true for a queued !fair task, it will enqueued (attached) before
> check_class_changed().
>
> enqueue_task_fair()
> enqueue_entity()
> update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH)
> if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD)) --> true
> __update_load_avg_se(now, cfs_rq, se) --> (1)
> check_class_changed()
> switched_to_fair()
> attach_task_cfs_rq()
> attach_entity_cfs_rq()
> update_load_avg(cfs_rq, se, sched_feat(ATTACH_AGE_LOAD) ? 0 : SKIP_AGE_LOAD)
> if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD)) --> true
> __update_load_avg_se(now, cfs_rq, se) --> (2)
>
>
> 1. for queued !fair: (1) delta = (now - last_update_time), last_update_time is the time
> when switched_from_fair().
>
> 2. for !queued !fair: (2) delta = (now - last_update_time), last_update_time is the time
> when switched_from_fair().
>
> The scenario in the commit message only cover !queued !fair case, I forget the queued !fair
> case, their problem is the same.
OK, that makes sense to me then.
>> I can't see how this will work with your proposed change in using
>> last_update_time=0 for fair->!fair->fair class changes?
> If we reset last_update_time=0 for !fair task, then:
>
> 1. for queued !fair: will not do (1) since the if condition is false.
>
> 2. for !queued !fair: will not do (2) since the if condition is false.
OK.
[...]
>>> This patch reset it's sched_avg last_update_time to 0, stop load
>>> tracking for !fair task, later in switched_to_fair() ->
>>> update_load_avg(), we can use its saved sched_avg.
>>>
>>> Signed-off-by: Chengming Zhou <[email protected]>
>>> ---
>>> kernel/sched/fair.c | 5 +++++
>>> 1 file changed, 5 insertions(+)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 50f65a2ede32..576028f5a09e 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -11552,6 +11552,11 @@ static void attach_task_cfs_rq(struct task_struct *p)
>>> static void switched_from_fair(struct rq *rq, struct task_struct *p)
>>> {
>>> detach_task_cfs_rq(p);
>>> +
>>> +#ifdef CONFIG_SMP
>>> + /* Stop load tracking for !fair task */
You're not really stopping p->se load tracking by doing this. We don't
do this outside fair anyway. IMHO, you freeze p->se's PELT _avg/_sum
values to be used as initial values when re-entering fair.
>>> + p->se.avg.last_update_time = 0;
>>> +#endif
>>> }
>>>
>>> static void switched_to_fair(struct rq *rq, struct task_struct *p)
On 2022/7/15 19:15, Dietmar Eggemann wrote:
> On 14/07/2022 15:43, Chengming Zhou wrote:
>> On 2022/7/14 20:33, Dietmar Eggemann wrote:
>>> On 13/07/2022 06:04, Chengming Zhou wrote:
>
> [...]
>
>>> IMHO, a queued !fair task when switching back to fair will already be
>>> enqueued (attached) as a fair task in __sched_setscheduler() prior to
>>> the check_class_changed() call.
>>
>> Right, this is true for a queued !fair task, it will enqueued (attached) before
>> check_class_changed().
>>
>> enqueue_task_fair()
>> enqueue_entity()
>> update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH)
>> if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD)) --> true
>> __update_load_avg_se(now, cfs_rq, se) --> (1)
>> check_class_changed()
>> switched_to_fair()
>> attach_task_cfs_rq()
>> attach_entity_cfs_rq()
>> update_load_avg(cfs_rq, se, sched_feat(ATTACH_AGE_LOAD) ? 0 : SKIP_AGE_LOAD)
>> if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD)) --> true
>> __update_load_avg_se(now, cfs_rq, se) --> (2)
>>
>>
>> 1. for queued !fair: (1) delta = (now - last_update_time), last_update_time is the time
>> when switched_from_fair().
>>
>> 2. for !queued !fair: (2) delta = (now - last_update_time), last_update_time is the time
>> when switched_from_fair().
>>
>> The scenario in the commit message only cover !queued !fair case, I forget the queued !fair
>> case, their problem is the same.
>
> OK, that makes sense to me then.
>
>>> I can't see how this will work with your proposed change in using
>>> last_update_time=0 for fair->!fair->fair class changes?
>> If we reset last_update_time=0 for !fair task, then:
>>
>> 1. for queued !fair: will not do (1) since the if condition is false.
>>
>> 2. for !queued !fair: will not do (2) since the if condition is false.
> OK.
>
> [...]
>
>>>> This patch reset it's sched_avg last_update_time to 0, stop load
>>>> tracking for !fair task, later in switched_to_fair() ->
>>>> update_load_avg(), we can use its saved sched_avg.
>>>>
>>>> Signed-off-by: Chengming Zhou <[email protected]>
>>>> ---
>>>> kernel/sched/fair.c | 5 +++++
>>>> 1 file changed, 5 insertions(+)
>>>>
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index 50f65a2ede32..576028f5a09e 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -11552,6 +11552,11 @@ static void attach_task_cfs_rq(struct task_struct *p)
>>>> static void switched_from_fair(struct rq *rq, struct task_struct *p)
>>>> {
>>>> detach_task_cfs_rq(p);
>>>> +
>>>> +#ifdef CONFIG_SMP
>>>> + /* Stop load tracking for !fair task */
>
> You're not really stopping p->se load tracking by doing this. We don't
> do this outside fair anyway. IMHO, you freeze p->se's PELT _avg/_sum
> values to be used as initial values when re-entering fair.
>
Yes, you are right, this comment is misleading and wrong, will delete it.
Thanks very much for your review!
>>>> + p->se.avg.last_update_time = 0;
>>>> +#endif
>>>> }
>>>>
>>>> static void switched_to_fair(struct rq *rq, struct task_struct *p)
>
On Wed, 13 Jul 2022 at 06:05, Chengming Zhou
<[email protected]> wrote:
>
> The same reason as the previous commit, if we don't reset the
> sched_avg last_update_time to 0, after a while in switched_to_fair():
>
> switched_to_fair
> attach_task_cfs_rq
> attach_entity_cfs_rq
> update_load_avg
> __update_load_avg_se(now, cfs_rq, se)
>
> The delta (now - sa->last_update_time) will wrongly contribute/decay
> sched_avg depends on the task running/runnable status at that time.
I think that we always decay the time spent in !fair class and never contribute
This one is a bit more complex than the new task one. Generally
speaking, I would say that we should decay the load while running in
!fair case because the value becomes less and less relevant over time.
What's the meaning of pelt values when the task has run seconds as a
fifo ? The only case that would need a particular attention, is the pi
boost case but there is no way to now how long is runs and sleep when
!fair
>
> This patch reset it's sched_avg last_update_time to 0, stop load
> tracking for !fair task, later in switched_to_fair() ->
> update_load_avg(), we can use its saved sched_avg.
>
> Signed-off-by: Chengming Zhou <[email protected]>
> ---
> kernel/sched/fair.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 50f65a2ede32..576028f5a09e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -11552,6 +11552,11 @@ static void attach_task_cfs_rq(struct task_struct *p)
> static void switched_from_fair(struct rq *rq, struct task_struct *p)
> {
> detach_task_cfs_rq(p);
> +
> +#ifdef CONFIG_SMP
> + /* Stop load tracking for !fair task */
> + p->se.avg.last_update_time = 0;
> +#endif
> }
>
> static void switched_to_fair(struct rq *rq, struct task_struct *p)
> --
> 2.36.1
>
On 2022/7/19 21:20, Vincent Guittot wrote:
> On Wed, 13 Jul 2022 at 06:05, Chengming Zhou
> <[email protected]> wrote:
>>
>> The same reason as the previous commit, if we don't reset the
>> sched_avg last_update_time to 0, after a while in switched_to_fair():
>>
>> switched_to_fair
>> attach_task_cfs_rq
>> attach_entity_cfs_rq
>> update_load_avg
>> __update_load_avg_se(now, cfs_rq, se)
>>
>> The delta (now - sa->last_update_time) will wrongly contribute/decay
>> sched_avg depends on the task running/runnable status at that time.
>
> I think that we always decay the time spent in !fair class and never contribute
>
> This one is a bit more complex than the new task one. Generally
> speaking, I would say that we should decay the load while running in
> !fair case because the value becomes less and less relevant over time.
> What's the meaning of pelt values when the task has run seconds as a
> fifo ? The only case that would need a particular attention, is the pi
> boost case but there is no way to now how long is runs and sleep when
> !fair
After thought for a while, I agree with you on that the PELT values become
meaningless after running as a FIFO task. So yes it's best for now to always
decay them when switched_to_fair().
So in the next version:
1. don't set for !fair in post_init_entity_util_avg()
2. keep last_update_time when switched_from_fair()
3. should we still keep SKIP_AGE_LOAD ? I don't understand what's its usage
in switched_to_fair(). Or should we just delete it?
Thanks!
>
>>
>> This patch reset it's sched_avg last_update_time to 0, stop load
>> tracking for !fair task, later in switched_to_fair() ->
>> update_load_avg(), we can use its saved sched_avg.
>>
>> Signed-off-by: Chengming Zhou <[email protected]>
>> ---
>> kernel/sched/fair.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 50f65a2ede32..576028f5a09e 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -11552,6 +11552,11 @@ static void attach_task_cfs_rq(struct task_struct *p)
>> static void switched_from_fair(struct rq *rq, struct task_struct *p)
>> {
>> detach_task_cfs_rq(p);
>> +
>> +#ifdef CONFIG_SMP
>> + /* Stop load tracking for !fair task */
>> + p->se.avg.last_update_time = 0;
>> +#endif
>> }
>>
>> static void switched_to_fair(struct rq *rq, struct task_struct *p)
>> --
>> 2.36.1
>>