When create a new task, we initialize vruntime of the new task
at sched_cgroup_fork(). However, the timing of executing this
action is too early and may not be accurate.
Because it use current cpu to init the vruntime, but the new
task actually runs on the cpu which be assigned at wake_up_new_task().
To optimize this case, we pass ENQUEUE_INITIAL flag to
activate_task() in wake_up_new_task(), in this way,
when place_entity is called in enqueue_entity(), the
vruntime of the new task will be initialized. At the same
time, place_entity in task_fork_fair() is useless, remove it.
Signed-off-by: Zhang Qiao <[email protected]>
---
kernel/sched/core.c | 2 +-
kernel/sched/fair.c | 16 ----------------
2 files changed, 1 insertion(+), 17 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bcf2c4cc0522..b4ff595a2dc8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4897,7 +4897,7 @@ void wake_up_new_task(struct task_struct *p)
update_rq_clock(rq);
post_init_entity_util_avg(p);
- activate_task(rq, p, ENQUEUE_NOCLOCK);
+ activate_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_INITIAL);
trace_sched_wakeup_new(p);
wakeup_preempt(rq, p, WF_FORK);
#ifdef CONFIG_SMP
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index efce2d36a783..bb5f376fd51e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12702,23 +12702,7 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
*/
static void task_fork_fair(struct task_struct *p)
{
- struct sched_entity *se = &p->se, *curr;
- struct cfs_rq *cfs_rq;
- struct rq *rq = this_rq();
- struct rq_flags rf;
-
- rq_lock(rq, &rf);
-
set_task_max_allowed_capacity(p);
-
- cfs_rq = task_cfs_rq(current);
- curr = cfs_rq->curr;
- if (curr) {
- update_rq_clock(rq);
- update_curr(cfs_rq);
- }
- place_entity(cfs_rq, se, ENQUEUE_INITIAL);
- rq_unlock(rq, &rf);
}
/*
--
2.18.0.huawei.25
On Thu, Jun 06, 2024 at 08:11:33PM +0800, Zhang Qiao wrote:
> When create a new task, we initialize vruntime of the new task
> at sched_cgroup_fork(). However, the timing of executing this
> action is too early and may not be accurate.
>
> Because it use current cpu to init the vruntime, but the new
> task actually runs on the cpu which be assigned at wake_up_new_task().
>
> To optimize this case, we pass ENQUEUE_INITIAL flag to
> activate_task() in wake_up_new_task(), in this way,
> when place_entity is called in enqueue_entity(), the
> vruntime of the new task will be initialized. At the same
> time, place_entity in task_fork_fair() is useless, remove it.
The better argument would've looked at history to see why the code was
the way it is and then verify those reasons are no longer valid.
Specifically, I think these are remains of child_runs_first, and that is
now gone.
Can you verify and update accordingly?
> Signed-off-by: Zhang Qiao <[email protected]>
> ---
> kernel/sched/core.c | 2 +-
> kernel/sched/fair.c | 16 ----------------
> 2 files changed, 1 insertion(+), 17 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index bcf2c4cc0522..b4ff595a2dc8 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4897,7 +4897,7 @@ void wake_up_new_task(struct task_struct *p)
> update_rq_clock(rq);
> post_init_entity_util_avg(p);
>
> - activate_task(rq, p, ENQUEUE_NOCLOCK);
> + activate_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_INITIAL);
> trace_sched_wakeup_new(p);
> wakeup_preempt(rq, p, WF_FORK);
> #ifdef CONFIG_SMP
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index efce2d36a783..bb5f376fd51e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -12702,23 +12702,7 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
> */
> static void task_fork_fair(struct task_struct *p)
> {
> - struct sched_entity *se = &p->se, *curr;
> - struct cfs_rq *cfs_rq;
> - struct rq *rq = this_rq();
> - struct rq_flags rf;
> -
> - rq_lock(rq, &rf);
> -
> set_task_max_allowed_capacity(p);
> -
> - cfs_rq = task_cfs_rq(current);
> - curr = cfs_rq->curr;
> - if (curr) {
> - update_rq_clock(rq);
> - update_curr(cfs_rq);
> - }
> - place_entity(cfs_rq, se, ENQUEUE_INITIAL);
> - rq_unlock(rq, &rf);
> }
>
> /*
> --
> 2.18.0.huawei.25
>
在 2024/6/7 18:30, Peter Zijlstra 写道:
> On Thu, Jun 06, 2024 at 08:11:33PM +0800, Zhang Qiao wrote:
>> When create a new task, we initialize vruntime of the new task
>> at sched_cgroup_fork(). However, the timing of executing this
>> action is too early and may not be accurate.
>>
>> Because it use current cpu to init the vruntime, but the new
>> task actually runs on the cpu which be assigned at wake_up_new_task().
>>
>> To optimize this case, we pass ENQUEUE_INITIAL flag to
>> activate_task() in wake_up_new_task(), in this way,
>> when place_entity is called in enqueue_entity(), the
>> vruntime of the new task will be initialized. At the same
>> time, place_entity in task_fork_fair() is useless, remove it.
>
> The better argument would've looked at history to see why the code was
> the way it is and then verify those reasons are no longer valid.
>
> Specifically, I think these are remains of child_runs_first, and that is
> now gone.> > Can you verify and update accordingly?
Initially, __enqueue_entity() was in task_new_fair(), in order to schedule
according to se->vruntime, only a"se->vruntime = cfs_rq->min_vruntime" was
added here. This modification was introduced by commit e9acbff648 ("sched: introduce se->vruntime").
Then,the commit 4d78e7b656aa("sched: new task placement for vruntime") added proper
new task placement for the vruntime based math, this also requires the new task's vruntime value.
The commit aeb73b040399("sched: clean up new task placement") clean up code and extract
a place_entity() helper function.
To summarize, the place_entity() in task_fork_fair() was for the child_runs_first and enqueue_entity,
After remove the child_runs_first and enqueue_entity from task_fork_fair(), we can remove this place_entity().
>
>> Signed-off-by: Zhang Qiao <[email protected]>
>> ---
>> kernel/sched/core.c | 2 +-
>> kernel/sched/fair.c | 16 ----------------
>> 2 files changed, 1 insertion(+), 17 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index bcf2c4cc0522..b4ff595a2dc8 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -4897,7 +4897,7 @@ void wake_up_new_task(struct task_struct *p)
>> update_rq_clock(rq);
>> post_init_entity_util_avg(p);
>>
>> - activate_task(rq, p, ENQUEUE_NOCLOCK);
>> + activate_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_INITIAL);
>> trace_sched_wakeup_new(p);
>> wakeup_preempt(rq, p, WF_FORK);
>> #ifdef CONFIG_SMP
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index efce2d36a783..bb5f376fd51e 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -12702,23 +12702,7 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
>> */
>> static void task_fork_fair(struct task_struct *p)
>> {
>> - struct sched_entity *se = &p->se, *curr;
>> - struct cfs_rq *cfs_rq;
>> - struct rq *rq = this_rq();
>> - struct rq_flags rf;
>> -
>> - rq_lock(rq, &rf);
>> -
>> set_task_max_allowed_capacity(p);
>> -
>> - cfs_rq = task_cfs_rq(current);
>> - curr = cfs_rq->curr;
>> - if (curr) {
>> - update_rq_clock(rq);
>> - update_curr(cfs_rq);
>> - }
>> - place_entity(cfs_rq, se, ENQUEUE_INITIAL);
>> - rq_unlock(rq, &rf);
>> }
>>
>> /*
>> --
>> 2.18.0.huawei.25
>>
> .
>
> When create a new task, …
creating?
> Because it use current cpu …
uses current CPU?
> To optimize this case, we pass ENQUEUE_INITIAL flag to
> activate_task() in wake_up_new_task(), in this way,
> when place_entity is called in enqueue_entity(), the
…
Please improve such a change description also according to word wrapping
because of a bit longer text line lengths.
Regards,
Markus