In init_entity_runnable_average() the last_update_time is initialized to
zero. The task is given max load and utilization as a pessimistic
initial estimate.
But if in wake_up_new_task() the task is placed on a CPU other than
where it was created, __update_load_avg() will be called via
set_task_cpu() -> migrate_task_rq_fair() -> remove_entity_load_avg().
Since last_update_time is zero the delta will be huge and the task's
load will be entirely decayed away before it is enqueued at the
destination CPU.
If last_update_time is initialized to cfs_rq_clock_task() the load will
not go away, but it will also then be subtracted from the original CPU
in remove_entity_load_avg() if the task is placed on a different CPU,
which is bad since it was never added there before.
Thinking about this more it seemed questionable to treat the assignment
of a task to a new CPU in wake_up_new_task() as a migration given that
the task has never executed previously. Would it make sense to call
__set_task_cpu() there instead of set_task_cpu()?
thanks,
Steve
Hi Steve,
On Fri, Dec 11, 2015 at 06:01:45PM -0800, Steve Muckle wrote:
> In init_entity_runnable_average() the last_update_time is initialized to
> zero. The task is given max load and utilization as a pessimistic
> initial estimate.
>
> But if in wake_up_new_task() the task is placed on a CPU other than
> where it was created, __update_load_avg() will be called via
> set_task_cpu() -> migrate_task_rq_fair() -> remove_entity_load_avg().
>
> Since last_update_time is zero the delta will be huge and the task's
> load will be entirely decayed away before it is enqueued at the
> destination CPU.
Since the new task's last_update_time is equal to 0, it will not be decayed.
Hi Yuyang,
On 12/13/2015 11:13 AM, Yuyang Du wrote:
> Hi Steve,
>
> On Fri, Dec 11, 2015 at 06:01:45PM -0800, Steve Muckle wrote:
>> In init_entity_runnable_average() the last_update_time is initialized to
>> zero. The task is given max load and utilization as a pessimistic
>> initial estimate.
>>
>> But if in wake_up_new_task() the task is placed on a CPU other than
>> where it was created, __update_load_avg() will be called via
>> set_task_cpu() -> migrate_task_rq_fair() -> remove_entity_load_avg().
>>
>> Since last_update_time is zero the delta will be huge and the task's
>> load will be entirely decayed away before it is enqueued at the
>> destination CPU.
>
> Since the new task's last_update_time is equal to 0, it will not be decayed.
Can you point me to the code for that logic? I don't see anything that
prevents the decay when a newly woken task is placed on a different CPU
via the call chain I mentioned above. My testing also shows the load
being decayed to zero.
thanks
Steve
On Mon, Dec 14, 2015 at 04:41:14PM -0800, Steve Muckle wrote:
> Hi Yuyang,
>
> On 12/13/2015 11:13 AM, Yuyang Du wrote:
> > Hi Steve,
> >
> > On Fri, Dec 11, 2015 at 06:01:45PM -0800, Steve Muckle wrote:
> >> In init_entity_runnable_average() the last_update_time is initialized to
> >> zero. The task is given max load and utilization as a pessimistic
> >> initial estimate.
> >>
> >> But if in wake_up_new_task() the task is placed on a CPU other than
> >> where it was created, __update_load_avg() will be called via
> >> set_task_cpu() -> migrate_task_rq_fair() -> remove_entity_load_avg().
> >>
> >> Since last_update_time is zero the delta will be huge and the task's
> >> load will be entirely decayed away before it is enqueued at the
> >> destination CPU.
> >
> > Since the new task's last_update_time is equal to 0, it will not be decayed.
>
> Can you point me to the code for that logic? I don't see anything that
> prevents the decay when a newly woken task is placed on a different CPU
> via the call chain I mentioned above. My testing also shows the load
> being decayed to zero.
>
You may search the last_update_time, and see it would be treated differently
if it is 0. Hope this may be helpful.
On 12/14/2015 06:24 PM, Yuyang Du wrote:
>>> On Fri, Dec 11, 2015 at 06:01:45PM -0800, Steve Muckle wrote:
>>>> In init_entity_runnable_average() the last_update_time is initialized to
>>>> zero. The task is given max load and utilization as a pessimistic
>>>> initial estimate.
>>>>
>>>> But if in wake_up_new_task() the task is placed on a CPU other than
>>>> where it was created, __update_load_avg() will be called via
>>>> set_task_cpu() -> migrate_task_rq_fair() -> remove_entity_load_avg().
>>>>
>>>> Since last_update_time is zero the delta will be huge and the task's
>>>> load will be entirely decayed away before it is enqueued at the
>>>> destination CPU.
>>>
>>> Since the new task's last_update_time is equal to 0, it will not be decayed.
>>
>> Can you point me to the code for that logic? I don't see anything that
>> prevents the decay when a newly woken task is placed on a different CPU
>> via the call chain I mentioned above. My testing also shows the load
>> being decayed to zero.
>>
> You may search the last_update_time, and see it would be treated differently
> if it is 0. Hope this may be helpful.
Are you referring to the test in enqueue_entity_load_avg()? If so that
isn't called until after remove_entity_load_avg() in this scenario,
which has no check on last_update_time.
thanks,
Steve
On Tue, Dec 15, 2015 at 10:45:53AM -0800, Steve Muckle wrote:
> On 12/14/2015 06:24 PM, Yuyang Du wrote:
> >>> On Fri, Dec 11, 2015 at 06:01:45PM -0800, Steve Muckle wrote:
> >>>> In init_entity_runnable_average() the last_update_time is initialized to
> >>>> zero. The task is given max load and utilization as a pessimistic
> >>>> initial estimate.
> >>>>
> >>>> But if in wake_up_new_task() the task is placed on a CPU other than
> >>>> where it was created, __update_load_avg() will be called via
> >>>> set_task_cpu() -> migrate_task_rq_fair() -> remove_entity_load_avg().
> >>>>
> >>>> Since last_update_time is zero the delta will be huge and the task's
> >>>> load will be entirely decayed away before it is enqueued at the
> >>>> destination CPU.
> >>>
> >>> Since the new task's last_update_time is equal to 0, it will not be decayed.
> >>
> >> Can you point me to the code for that logic? I don't see anything that
> >> prevents the decay when a newly woken task is placed on a different CPU
> >> via the call chain I mentioned above. My testing also shows the load
> >> being decayed to zero.
> >>
> > You may search the last_update_time, and see it would be treated differently
> > if it is 0. Hope this may be helpful.
>
> Are you referring to the test in enqueue_entity_load_avg()? If so that
> isn't called until after remove_entity_load_avg() in this scenario,
> which has no check on last_update_time.
Indeed it is. Sorry that I did not look at this carefully before.
I think it should still be regarded as migration. It looks better as such.
Hope the following patch should work.
---
Subject: [PATCH] sched: Fix new task's load avg removed from source CPU in
wake_up_new_task()
If a newly created task is selected to go to a different CPU in fork
balance when it wakes up the first time, its load averages should
not be removed from the source CPU since they are never added to
it before.
Fix it in remove_entity_load_avg(): when entity's last_update_time
is 0, simply return. This should precisely identify the case in
question, because in normal migration, the last_update_time is
set to 0 after remove_entity_load_avg().
Reported-by: Steve Muckle <[email protected]>
Signed-off-by: Yuyang Du <[email protected]>
---
kernel/sched/fair.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e3266eb..4676988 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2909,6 +2909,12 @@ void remove_entity_load_avg(struct sched_entity *se)
struct cfs_rq *cfs_rq = cfs_rq_of(se);
u64 last_update_time;
+ /*
+ * Newly created task should not be removed from the source CPU before migration
+ */
+ if (se->avg.last_update_time == 0)
+ return;
+
#ifndef CONFIG_64BIT
u64 last_update_time_copy;
--
Hi Yuyang,
[auto build test WARNING on tip/sched/core]
[also build test WARNING on v4.4-rc5 next-20151216]
url: https://github.com/0day-ci/linux/commits/Yuyang-Du/sched-Fix-new-task-s-load-avg-removed-from-source-CPU-in/20151216-154529
config: i386-randconfig-x000-12141102 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All warnings (new ones prefixed by >>):
kernel/sched/fair.c: In function 'remove_entity_load_avg':
>> kernel/sched/fair.c:2919:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
u64 last_update_time_copy;
^
vim +2919 kernel/sched/fair.c
9ee474f5 Paul Turner 2012-10-04 2903 /*
9d89c257 Yuyang Du 2015-07-15 2904 * Task first catches up with cfs_rq, and then subtract
9d89c257 Yuyang Du 2015-07-15 2905 * itself from the cfs_rq (task must be off the queue now).
9ee474f5 Paul Turner 2012-10-04 2906 */
9d89c257 Yuyang Du 2015-07-15 2907 void remove_entity_load_avg(struct sched_entity *se)
2dac754e Paul Turner 2012-10-04 2908 {
9d89c257 Yuyang Du 2015-07-15 2909 struct cfs_rq *cfs_rq = cfs_rq_of(se);
9d89c257 Yuyang Du 2015-07-15 2910 u64 last_update_time;
9d89c257 Yuyang Du 2015-07-15 2911
a13869ac Yuyang Du 2015-12-16 2912 /*
a13869ac Yuyang Du 2015-12-16 2913 * Newly created task should not be removed from the source CPU before migration
a13869ac Yuyang Du 2015-12-16 2914 */
a13869ac Yuyang Du 2015-12-16 2915 if (se->avg.last_update_time == 0)
a13869ac Yuyang Du 2015-12-16 2916 return;
a13869ac Yuyang Du 2015-12-16 2917
9d89c257 Yuyang Du 2015-07-15 2918 #ifndef CONFIG_64BIT
9d89c257 Yuyang Du 2015-07-15 @2919 u64 last_update_time_copy;
9ee474f5 Paul Turner 2012-10-04 2920
9d89c257 Yuyang Du 2015-07-15 2921 do {
9d89c257 Yuyang Du 2015-07-15 2922 last_update_time_copy = cfs_rq->load_last_update_time_copy;
9d89c257 Yuyang Du 2015-07-15 2923 smp_rmb();
9d89c257 Yuyang Du 2015-07-15 2924 last_update_time = cfs_rq->avg.last_update_time;
9d89c257 Yuyang Du 2015-07-15 2925 } while (last_update_time != last_update_time_copy);
9d89c257 Yuyang Du 2015-07-15 2926 #else
9d89c257 Yuyang Du 2015-07-15 2927 last_update_time = cfs_rq->avg.last_update_time;
:::::: The code at line 2919 was first introduced by commit
:::::: 9d89c257dfb9c51a532d69397f6eed75e5168c35 sched/fair: Rewrite runnable load and utilization average tracking
:::::: TO: Yuyang Du <[email protected]>
:::::: CC: Ingo Molnar <[email protected]>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
Hi Steve,
On Wed, Dec 16, 2015 at 06:50:43PM -0800, Steve Muckle wrote:
> On 12/15/2015 03:55 PM, Yuyang Du wrote:
> > Hope the following patch should work.
>
> Thanks Yuyang. AFAICS it should work, though I believe the test on
> last_update_time could instead go at the top of migrate_task_rq_fair()?
> It'd save the fn call to remove_entity_load_avg() and two unnecessary
> assignments (as p->se.avg.last_update_time and p->se.exec_start = 0 for
> newly forked tasks). This worked for me.
In a hindsight yesterday, this also occurred to me. But I think the fix is
also applicable to a group entity that is being removed but never used. And
such cases are not unlikely to happen.
To make the code less duplicate, I still do this in remove_entity_load_avg().
Make sense?
Sorry about the compile error.
---
Subject: [PATCH] sched: Fix new task's load avg removed from source CPU in
wake_up_new_task()
If a newly created task is selected to go to a different CPU in fork
balance when it wakes up the first time, its load averages should
not be removed from the source CPU since they are never added to
it before. The same is also applicable to a never used group entity.
Fix it in remove_entity_load_avg(): when entity's last_update_time
is 0, simply return. This should precisely identify the case in
question, because in other migrations, the last_update_time is set
to 0 after remove_entity_load_avg().
Reported-by: Steve Muckle <[email protected]>
Signed-off-by: Yuyang Du <[email protected]>
---
kernel/sched/fair.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e3266eb..3f6a8b3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2908,10 +2908,18 @@ void remove_entity_load_avg(struct sched_entity *se)
{
struct cfs_rq *cfs_rq = cfs_rq_of(se);
u64 last_update_time;
-
#ifndef CONFIG_64BIT
u64 last_update_time_copy;
+#endif
+ /*
+ * Newly created task or never used group entity should not be removed
+ * from its (source) cfs_rq
+ */
+ if (se->avg.last_update_time == 0)
+ return;
+
+#ifndef CONFIG_64BIT
do {
last_update_time_copy = cfs_rq->load_last_update_time_copy;
smp_rmb();
--
On Thu, Dec 17, 2015 at 10:43:03AM +0100, Peter Zijlstra wrote:
> On Thu, Dec 17, 2015 at 07:34:27AM +0800, Yuyang Du wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index e3266eb..3f6a8b3 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -2908,10 +2908,18 @@ void remove_entity_load_avg(struct sched_entity *se)
> > {
> > struct cfs_rq *cfs_rq = cfs_rq_of(se);
> > u64 last_update_time;
> > -
> > #ifndef CONFIG_64BIT
> > u64 last_update_time_copy;
> > +#endif
> >
> > + /*
> > + * Newly created task or never used group entity should not be removed
> > + * from its (source) cfs_rq
> > + */
> > + if (se->avg.last_update_time == 0)
> > + return;
> > +
> > +#ifndef CONFIG_64BIT
> > do {
> > last_update_time_copy = cfs_rq->load_last_update_time_copy;
> > smp_rmb();
>
> So that ifdef stuff annoyed me a wee bit, so I did the below.
>
> Initially I wanted to do a macro that we could use for both this and the
> vruntime, but that didn't end up particularly pretty either, so I
> scrapped that.
Oh yeah, looks good. :)
> ---
> Subject: sched: Fix new task's load avg removed from source CPU in wake_up_new_task()
> From: Yuyang Du <[email protected]>
> Date: Thu, 17 Dec 2015 07:34:27 +0800
>
> If a newly created task is selected to go to a different CPU in fork
> balance when it wakes up the first time, its load averages should
> not be removed from the source CPU since they are never added to
> it before. The same is also applicable to a never used group entity.
>
> Fix it in remove_entity_load_avg(): when entity's last_update_time
> is 0, simply return. This should precisely identify the case in
> question, because in other migrations, the last_update_time is set
> to 0 after remove_entity_load_avg().
>
> Reported-by: Steve Muckle <[email protected]>
> Signed-off-by: Yuyang Du <[email protected]>
> Cc: Patrick Bellasi <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Vincent Guittot <[email protected]>
> Cc: Juri Lelli <[email protected]>
> Cc: Dietmar Eggemann <[email protected]>
> Cc: Morten Rasmussen <[email protected]>
> [peterz: cfs_rq_last_update_time]
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Link: http://lkml.kernel.org/r/[email protected]
> ---
> kernel/sched/fair.c | 38 ++++++++++++++++++++++++++++----------
> 1 file changed, 28 insertions(+), 10 deletions(-)
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2809,27 +2809,45 @@ dequeue_entity_load_avg(struct cfs_rq *c
> max_t(s64, cfs_rq->runnable_load_sum - se->avg.load_sum, 0);
> }
>
> -/*
> - * Task first catches up with cfs_rq, and then subtract
> - * itself from the cfs_rq (task must be off the queue now).
> - */
> -void remove_entity_load_avg(struct sched_entity *se)
> -{
> - struct cfs_rq *cfs_rq = cfs_rq_of(se);
> - u64 last_update_time;
> -
> #ifndef CONFIG_64BIT
> +static inline u64 cfs_rq_last_update_time(struct cfs_rq *cfs_rq)
> +{
> u64 last_update_time_copy;
> + u64 last_update_time;
>
> do {
> last_update_time_copy = cfs_rq->load_last_update_time_copy;
> smp_rmb();
> last_update_time = cfs_rq->avg.last_update_time;
> } while (last_update_time != last_update_time_copy);
> +
> + return last_update_time;
> +}
> #else
> - last_update_time = cfs_rq->avg.last_update_time;
> +static inline u64 cfs_rq_last_update_time(struct cfs_rq *cfs_rq)
> +{
> + return cfs_rq->avg.last_update_time;
> +}
> #endif
>
> +/*
> + * Task first catches up with cfs_rq, and then subtract
> + * itself from the cfs_rq (task must be off the queue now).
> + */
> +void remove_entity_load_avg(struct sched_entity *se)
> +{
> + struct cfs_rq *cfs_rq = cfs_rq_of(se);
> + u64 last_update_time;
> +
> + /*
> + * Newly created task or never used group entity should not be removed
> + * from its (source) cfs_rq
> + */
> + if (se->avg.last_update_time == 0)
> + return;
> +
> + last_update_time = cfs_rq_last_update_time(cfs_rq);
> +
> __update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0, 0, NULL);
> atomic_long_add(se->avg.load_avg, &cfs_rq->removed_load_avg);
> atomic_long_add(se->avg.util_avg, &cfs_rq->removed_util_avg);
On 12/15/2015 03:55 PM, Yuyang Du wrote:
> Hope the following patch should work.
Thanks Yuyang. AFAICS it should work, though I believe the test on
last_update_time could instead go at the top of migrate_task_rq_fair()?
It'd save the fn call to remove_entity_load_avg() and two unnecessary
assignments (as p->se.avg.last_update_time and p->se.exec_start = 0 for
newly forked tasks). This worked for me.
thanks,
Steve
On Thu, Dec 17, 2015 at 07:34:27AM +0800, Yuyang Du wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e3266eb..3f6a8b3 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2908,10 +2908,18 @@ void remove_entity_load_avg(struct sched_entity *se)
> {
> struct cfs_rq *cfs_rq = cfs_rq_of(se);
> u64 last_update_time;
> -
> #ifndef CONFIG_64BIT
> u64 last_update_time_copy;
> +#endif
>
> + /*
> + * Newly created task or never used group entity should not be removed
> + * from its (source) cfs_rq
> + */
> + if (se->avg.last_update_time == 0)
> + return;
> +
> +#ifndef CONFIG_64BIT
> do {
> last_update_time_copy = cfs_rq->load_last_update_time_copy;
> smp_rmb();
So that ifdef stuff annoyed me a wee bit, so I did the below.
Initially I wanted to do a macro that we could use for both this and the
vruntime, but that didn't end up particularly pretty either, so I
scrapped that.
---
Subject: sched: Fix new task's load avg removed from source CPU in wake_up_new_task()
From: Yuyang Du <[email protected]>
Date: Thu, 17 Dec 2015 07:34:27 +0800
If a newly created task is selected to go to a different CPU in fork
balance when it wakes up the first time, its load averages should
not be removed from the source CPU since they are never added to
it before. The same is also applicable to a never used group entity.
Fix it in remove_entity_load_avg(): when entity's last_update_time
is 0, simply return. This should precisely identify the case in
question, because in other migrations, the last_update_time is set
to 0 after remove_entity_load_avg().
Reported-by: Steve Muckle <[email protected]>
Signed-off-by: Yuyang Du <[email protected]>
Cc: Patrick Bellasi <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Morten Rasmussen <[email protected]>
[peterz: cfs_rq_last_update_time]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 38 ++++++++++++++++++++++++++++----------
1 file changed, 28 insertions(+), 10 deletions(-)
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2809,27 +2809,45 @@ dequeue_entity_load_avg(struct cfs_rq *c
max_t(s64, cfs_rq->runnable_load_sum - se->avg.load_sum, 0);
}
-/*
- * Task first catches up with cfs_rq, and then subtract
- * itself from the cfs_rq (task must be off the queue now).
- */
-void remove_entity_load_avg(struct sched_entity *se)
-{
- struct cfs_rq *cfs_rq = cfs_rq_of(se);
- u64 last_update_time;
-
#ifndef CONFIG_64BIT
+static inline u64 cfs_rq_last_update_time(struct cfs_rq *cfs_rq)
+{
u64 last_update_time_copy;
+ u64 last_update_time;
do {
last_update_time_copy = cfs_rq->load_last_update_time_copy;
smp_rmb();
last_update_time = cfs_rq->avg.last_update_time;
} while (last_update_time != last_update_time_copy);
+
+ return last_update_time;
+}
#else
- last_update_time = cfs_rq->avg.last_update_time;
+static inline u64 cfs_rq_last_update_time(struct cfs_rq *cfs_rq)
+{
+ return cfs_rq->avg.last_update_time;
+}
#endif
+/*
+ * Task first catches up with cfs_rq, and then subtract
+ * itself from the cfs_rq (task must be off the queue now).
+ */
+void remove_entity_load_avg(struct sched_entity *se)
+{
+ struct cfs_rq *cfs_rq = cfs_rq_of(se);
+ u64 last_update_time;
+
+ /*
+ * Newly created task or never used group entity should not be removed
+ * from its (source) cfs_rq
+ */
+ if (se->avg.last_update_time == 0)
+ return;
+
+ last_update_time = cfs_rq_last_update_time(cfs_rq);
+
__update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0, 0, NULL);
atomic_long_add(se->avg.load_avg, &cfs_rq->removed_load_avg);
atomic_long_add(se->avg.util_avg, &cfs_rq->removed_util_avg);