From: Byungchul Park <[email protected]>
hello,
there are 3 problems when att(det)aching a se to(from) its cfs_rq.
problem 1. se's average load is not accounted with new cfs_rq in queued case,
when a task changes its cgroup.
problem 2. cfs_rq->avg.load_avg becomes larger and larger whenever changing
cgroup to another. we have to sync se's average load with prev cfs_rq.
problem 3. current code does not sync it with its cfs_rq when switching
sched class to fair class. if we can always assume that a se has been
detached from fair class for a long time enough for its average load to
become useless, at the time attaching it to its fair class cfs_rq, then
current code is acceptable. but the assumption is not always true.
patch 1/5, does code refactoring for further patches.
patch 2/5, solves the problem 1.
patch 3/5, solves the problem 2.
patch 4/5, solves the problem 3.
patch 5/5, does code refactoring for better readability.
change from v2 to v1 (logic is not changed at all)
* fix up my poor english in commit message and comment
* break down big patches into more patches for being reviewed easily
* supplement cover letter messages
change from v1 to v2
* introduce two functions for adjusting vruntime and load when attaching
and detaching.
* call the introduced functions instead of switched_from(to)_fair() directly
in task_move_group_fair().
* add decaying logic for a se which has detached from a cfs_rq.
thanks,
byungchul
Byungchul Park (5):
sched: add two functions adjusting cfs_rq's load when att(det)aching
a se
sched: make task_move_group_fair adjust cfs_rq's load in case of
queued
sched: sync a se with prev cfs_rq when changing cgroup
sched: sync a se with its cfs_rq when switching sched class to fair
class
sched: add two functions for att(det)aching a task to(from) a cfs_rq
kernel/sched/fair.c | 209 +++++++++++++++++++++++++++------------------------
1 file changed, 110 insertions(+), 99 deletions(-)
--
1.7.9.5
From: Byungchul Park <[email protected]>
this patch introduces two functions adjusting cfs_rq's average load when
att(det)aching a se to(from) the cfs_rq, and let them use these functions.
Signed-off-by: Byungchul Park <[email protected]>
---
kernel/sched/fair.c | 58 +++++++++++++++++++++++++++------------------------
1 file changed, 31 insertions(+), 27 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4d5f97b..7475a40 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2709,6 +2709,31 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
update_tg_load_avg(cfs_rq, 0);
}
+static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+ se->avg.last_update_time = cfs_rq->avg.last_update_time;
+ cfs_rq->avg.load_avg += se->avg.load_avg;
+ cfs_rq->avg.load_sum += se->avg.load_sum;
+ cfs_rq->avg.util_avg += se->avg.util_avg;
+ cfs_rq->avg.util_sum += se->avg.util_sum;
+}
+
+static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+ __update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
+ &se->avg, se->on_rq * scale_load_down(se->load.weight),
+ cfs_rq->curr == se, NULL);
+
+ cfs_rq->avg.load_avg =
+ max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
+ cfs_rq->avg.load_sum =
+ max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
+ cfs_rq->avg.util_avg =
+ max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
+ cfs_rq->avg.util_sum =
+ max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
+}
+
/* Add the load generated by se into cfs_rq's load average */
static inline void
enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
@@ -2717,27 +2742,20 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
u64 now = cfs_rq_clock_task(cfs_rq);
int migrated = 0, decayed;
- if (sa->last_update_time == 0) {
- sa->last_update_time = now;
+ if (sa->last_update_time == 0)
migrated = 1;
- }
- else {
+ else
__update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
se->on_rq * scale_load_down(se->load.weight),
cfs_rq->curr == se, NULL);
- }
decayed = update_cfs_rq_load_avg(now, cfs_rq);
cfs_rq->runnable_load_avg += sa->load_avg;
cfs_rq->runnable_load_sum += sa->load_sum;
- if (migrated) {
- cfs_rq->avg.load_avg += sa->load_avg;
- cfs_rq->avg.load_sum += sa->load_sum;
- cfs_rq->avg.util_avg += sa->util_avg;
- cfs_rq->avg.util_sum += sa->util_sum;
- }
+ if (migrated)
+ attach_entity_load_avg(cfs_rq, se);
if (decayed || migrated)
update_tg_load_avg(cfs_rq, 0);
@@ -7911,17 +7929,7 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p)
#ifdef CONFIG_SMP
/* Catch up with the cfs_rq and remove our load when we leave */
- __update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq), &se->avg,
- se->on_rq * scale_load_down(se->load.weight), cfs_rq->curr == se, NULL);
-
- cfs_rq->avg.load_avg =
- max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
- cfs_rq->avg.load_sum =
- max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
- cfs_rq->avg.util_avg =
- max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
- cfs_rq->avg.util_sum =
- max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
+ detach_entity_load_avg(cfs_rq, se);
#endif
}
@@ -8042,11 +8050,7 @@ static void task_move_group_fair(struct task_struct *p, int queued)
#ifdef CONFIG_SMP
/* Virtually synchronize task with its new cfs_rq */
- p->se.avg.last_update_time = cfs_rq->avg.last_update_time;
- cfs_rq->avg.load_avg += p->se.avg.load_avg;
- cfs_rq->avg.load_sum += p->se.avg.load_sum;
- cfs_rq->avg.util_avg += p->se.avg.util_avg;
- cfs_rq->avg.util_sum += p->se.avg.util_sum;
+ attach_entity_load_avg(cfs_rq, se);
#endif
}
}
--
1.7.9.5
From: Byungchul Park <[email protected]>
se's average load should be added to new cfs_rq, not only in case of
!queued but also in case of queued.
of course, older code managed cfs_rq's blocked load separately. in that
case, the blocked load was meaningful only in case that the se is in
!queued. but now load tracking code is changed, it is not true. code
adjusting cfs_rq's average load should be changed.
Signed-off-by: Byungchul Park <[email protected]>
---
kernel/sched/fair.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7475a40..191d9be 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8044,15 +8044,13 @@ static void task_move_group_fair(struct task_struct *p, int queued)
se->vruntime -= cfs_rq_of(se)->min_vruntime;
set_task_rq(p, task_cpu(p));
se->depth = se->parent ? se->parent->depth + 1 : 0;
- if (!queued) {
- cfs_rq = cfs_rq_of(se);
+ cfs_rq = cfs_rq_of(se);
+ if (!queued)
se->vruntime += cfs_rq->min_vruntime;
-
#ifdef CONFIG_SMP
- /* Virtually synchronize task with its new cfs_rq */
- attach_entity_load_avg(cfs_rq, se);
+ /* Virtually synchronize task with its new cfs_rq */
+ attach_entity_load_avg(cfs_rq, se);
#endif
- }
}
void free_fair_sched_group(struct task_group *tg)
--
1.7.9.5
From: Byungchul Park <[email protected]>
current code is wrong with cfs_rq's average load when changing a task's
cfs_rq to another. i tested with "echo pid > cgroup" and found that
e.g. cfs_rq->avg.load_avg became larger and larger whenever changing
cgroup to another again and again. we have to sync se's average load
with both *prev* cfs_rq and next cfs_rq when changing its cgroup.
Signed-off-by: Byungchul Park <[email protected]>
---
kernel/sched/fair.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 191d9be..1be042a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8040,8 +8040,14 @@ static void task_move_group_fair(struct task_struct *p, int queued)
if (!queued && (!se->sum_exec_runtime || p->state == TASK_WAKING))
queued = 1;
+ cfs_rq = cfs_rq_of(se);
if (!queued)
- se->vruntime -= cfs_rq_of(se)->min_vruntime;
+ se->vruntime -= cfs_rq->min_vruntime;
+
+#ifdef CONFIG_SMP
+ /* synchronize task with its prev cfs_rq */
+ detach_entity_load_avg(cfs_rq, se);
+#endif
set_task_rq(p, task_cpu(p));
se->depth = se->parent ? se->parent->depth + 1 : 0;
cfs_rq = cfs_rq_of(se);
--
1.7.9.5
From: Byungchul Park <[email protected]>
we have to sync a se with its cfs_rq, when switching sched class to
fair class.
current code does not sync it because the se average load won't be
valid any more if it has been detached for a long time. however the
se's average load would be valid just after being detached from cfs_rq,
so we need to sync it in that case, e.g. priority inheritance.
to solve the problem that a se average load becomes more useless over
time, this patch decays its average load even for the duration that
the se has been detached, when it gets attached to the cfs_rq.
Signed-off-by: Byungchul Park <[email protected]>
---
kernel/sched/fair.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1be042a..3419f6c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2711,6 +2711,17 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
+ /*
+ * in case of migration and cgroup-change, more care should be taken
+ * because se's cfs_rq was changed, that means calling __update_load_avg
+ * with new cfs_rq->avg.last_update_time is meaningless. so we skip the
+ * update here. we have to update it with prev cfs_rq just before changing
+ * se's cfs_rq, and get here soon.
+ */
+ if (se->avg.last_update_time)
+ __update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
+ &se->avg, 0, 0, NULL);
+
se->avg.last_update_time = cfs_rq->avg.last_update_time;
cfs_rq->avg.load_avg += se->avg.load_avg;
cfs_rq->avg.load_sum += se->avg.load_sum;
@@ -7948,6 +7959,11 @@ static void switched_to_fair(struct rq *rq, struct task_struct *p)
se->depth = se->parent ? se->parent->depth + 1 : 0;
#endif
+#ifdef CONFIG_SMP
+ /* synchronize task with its cfs_rq */
+ attach_entity_load_avg(cfs_rq_of(&p->se), &p->se);
+#endif
+
if (!task_on_rq_queued(p)) {
/*
@@ -8049,6 +8065,10 @@ static void task_move_group_fair(struct task_struct *p, int queued)
detach_entity_load_avg(cfs_rq, se);
#endif
set_task_rq(p, task_cpu(p));
+
+ /* tell se's cfs_rq has been changed */
+ p->se.avg.last_update_time = 0;
+
se->depth = se->parent ? se->parent->depth + 1 : 0;
cfs_rq = cfs_rq_of(se);
if (!queued)
--
1.7.9.5
From: Byungchul Park <[email protected]>
this patch introduces two functions for adjusting a task's vruntime
and the target cfs_rq's average load so that redundancy code doing
same things can be simplified. and it provides better readability.
switched_from_fair, switched_to_fair and task_move_group_fair can
use introduced functions and get simplified.
Signed-off-by: Byungchul Park <[email protected]>
---
kernel/sched/fair.c | 143 +++++++++++++++++++++++----------------------------
1 file changed, 63 insertions(+), 80 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3419f6c..242f0d3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7915,21 +7915,29 @@ prio_changed_fair(struct rq *rq, struct task_struct *p, int oldprio)
check_preempt_curr(rq, p, 0);
}
-static void switched_from_fair(struct rq *rq, struct task_struct *p)
+static void detach_task_cfs_rq(struct task_struct *p, int queued)
{
struct sched_entity *se = &p->se;
struct cfs_rq *cfs_rq = cfs_rq_of(se);
/*
- * Ensure the task's vruntime is normalized, so that when it's
- * switched back to the fair class the enqueue_entity(.flags=0) will
- * do the right thing.
+ * Ensure the task's vruntime is normalized, so that after attaching
+ * back it to a cfs_rq the enqueue_entity() will do the right thing.
*
* If it's queued, then the dequeue_entity(.flags=0) will already
* have normalized the vruntime, if it's !queued, then only when
- * the task is sleeping will it still have non-normalized vruntime.
+ * the task is sleeping it still has a non-normalized vruntime.
+ *
+ * When !queued, vruntime of the task has usually NOT been normalized.
+ * But there are some cases where it has already been normalized:
+ *
+ * - Moving a forked child which is waiting for being woken up by
+ * wake_up_new_task().
+ * - Moving a task which has been woken up by try_to_wake_up() and
+ * waiting for actually being woken up by sched_ttwu_pending().
*/
- if (!task_on_rq_queued(p) && p->state != TASK_RUNNING) {
+ if (!queued && p->state != TASK_RUNNING &&
+ se->sum_exec_runtime && p->state != TASK_WAKING) {
/*
* Fix up our vruntime so that the current sleep doesn't
* cause 'unlimited' sleep bonus.
@@ -7944,12 +7952,10 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p)
#endif
}
-/*
- * We switched to the sched_fair class.
- */
-static void switched_to_fair(struct rq *rq, struct task_struct *p)
+static void attach_task_cfs_rq(struct task_struct *p, int queued)
{
struct sched_entity *se = &p->se;
+ struct cfs_rq *cfs_rq = cfs_rq_of(se);
#ifdef CONFIG_FAIR_GROUP_SCHED
/*
@@ -7961,34 +7967,57 @@ static void switched_to_fair(struct rq *rq, struct task_struct *p)
#ifdef CONFIG_SMP
/* synchronize task with its cfs_rq */
- attach_entity_load_avg(cfs_rq_of(&p->se), &p->se);
+ attach_entity_load_avg(cfs_rq, se);
#endif
- if (!task_on_rq_queued(p)) {
+ /*
+ * Ensure the task has a non-normalized vruntime when attaching it
+ * to a cfs_rq, so that enqueue_entity() at wake-up time will do
+ * the right thing.
+ *
+ * If it's queued, then the enqueue_entity(.flags=0) makes the task
+ * has non-normalized vruntime, if it's !queued, then it still has
+ * a normalized vruntime.
+ *
+ * When !queued, a task usually should have a non-normalized vruntime
+ * after being attached to a cfs_rq. But there are some cases where
+ * we should keep it normalized:
+ *
+ * - Moving a forked child which is waiting for being woken up by
+ * wake_up_new_task().
+ * - Moving a task which has been woken up by try_to_wake_up() and
+ * waiting for actually being woken up by sched_ttwu_pending().
+ */
+ if (!queued && p->state != TASK_RUNNING &&
+ se->sum_exec_runtime && p->state != TASK_WAKING)
+ se->vruntime += cfs_rq->min_vruntime;
+}
+
+static void switched_from_fair(struct rq *rq, struct task_struct *p)
+{
+ detach_task_cfs_rq(p, task_on_rq_queued(p));
+}
+/*
+ * We switched to the sched_fair class.
+ */
+static void switched_to_fair(struct rq *rq, struct task_struct *p)
+{
+ int queued = task_on_rq_queued(p);
+
+ attach_task_cfs_rq(p, queued);
+
+ if (queued) {
/*
- * Ensure the task has a non-normalized vruntime when it is switched
- * back to the fair class with !queued, so that enqueue_entity() at
- * wake-up time will do the right thing.
- *
- * If it's queued, then the enqueue_entity(.flags=0) makes the task
- * has non-normalized vruntime, if it's !queued, then it still has
- * normalized vruntime.
+ * We were most likely switched from sched_rt, so
+ * kick off the schedule if running, otherwise just see
+ * if we can still preempt the current task.
*/
- if (p->state != TASK_RUNNING)
- se->vruntime += cfs_rq_of(se)->min_vruntime;
- return;
+ if (rq->curr == p)
+ resched_curr(rq);
+ else
+ check_preempt_curr(rq, p, 0);
}
-
- /*
- * We were most likely switched from sched_rt, so
- * kick off the schedule if running, otherwise just see
- * if we can still preempt the current task.
- */
- if (rq->curr == p)
- resched_curr(rq);
- else
- check_preempt_curr(rq, p, 0);
}
/* Account for a task changing its policy or group.
@@ -8025,58 +8054,12 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
#ifdef CONFIG_FAIR_GROUP_SCHED
static void task_move_group_fair(struct task_struct *p, int queued)
{
- struct sched_entity *se = &p->se;
- struct cfs_rq *cfs_rq;
-
- /*
- * If the task was not on the rq at the time of this cgroup movement
- * it must have been asleep, sleeping tasks keep their ->vruntime
- * absolute on their old rq until wakeup (needed for the fair sleeper
- * bonus in place_entity()).
- *
- * If it was on the rq, we've just 'preempted' it, which does convert
- * ->vruntime to a relative base.
- *
- * Make sure both cases convert their relative position when migrating
- * to another cgroup's rq. This does somewhat interfere with the
- * fair sleeper stuff for the first placement, but who cares.
- */
- /*
- * When !queued, vruntime of the task has usually NOT been normalized.
- * But there are some cases where it has already been normalized:
- *
- * - Moving a forked child which is waiting for being woken up by
- * wake_up_new_task().
- * - Moving a task which has been woken up by try_to_wake_up() and
- * waiting for actually being woken up by sched_ttwu_pending().
- *
- * To prevent boost or penalty in the new cfs_rq caused by delta
- * min_vruntime between the two cfs_rqs, we skip vruntime adjustment.
- */
- if (!queued && (!se->sum_exec_runtime || p->state == TASK_WAKING))
- queued = 1;
-
- cfs_rq = cfs_rq_of(se);
- if (!queued)
- se->vruntime -= cfs_rq->min_vruntime;
-
-#ifdef CONFIG_SMP
- /* synchronize task with its prev cfs_rq */
- detach_entity_load_avg(cfs_rq, se);
-#endif
+ detach_task_cfs_rq(p, queued);
set_task_rq(p, task_cpu(p));
/* tell se's cfs_rq has been changed */
p->se.avg.last_update_time = 0;
-
- se->depth = se->parent ? se->parent->depth + 1 : 0;
- cfs_rq = cfs_rq_of(se);
- if (!queued)
- se->vruntime += cfs_rq->min_vruntime;
-#ifdef CONFIG_SMP
- /* Virtually synchronize task with its new cfs_rq */
- attach_entity_load_avg(cfs_rq, se);
-#endif
+ attach_task_cfs_rq(p, queued);
}
void free_fair_sched_group(struct task_group *tg)
--
1.7.9.5
On Wed, Aug 19, 2015 at 03:47:11PM +0900, [email protected] wrote:
> From: Byungchul Park <[email protected]>
>
> hello,
>
> there are 3 problems when att(det)aching a se to(from) its cfs_rq.
>
> problem 1. se's average load is not accounted with new cfs_rq in queued case,
> when a task changes its cgroup.
>
> problem 2. cfs_rq->avg.load_avg becomes larger and larger whenever changing
> cgroup to another. we have to sync se's average load with prev cfs_rq.
>
> problem 3. current code does not sync it with its cfs_rq when switching
> sched class to fair class. if we can always assume that a se has been
> detached from fair class for a long time enough for its average load to
> become useless, at the time attaching it to its fair class cfs_rq, then
> current code is acceptable. but the assumption is not always true.
>
> patch 1/5, does code refactoring for further patches.
> patch 2/5, solves the problem 1.
> patch 3/5, solves the problem 2.
> patch 4/5, solves the problem 3.
> patch 5/5, does code refactoring for better readability.
>
> change from v2 to v1 (logic is not changed at all)
change from v2 to v3 (logic is not changed at all)
i am sorry for typo.
> * fix up my poor english in commit message and comment
> * break down big patches into more patches for being reviewed easily
> * supplement cover letter messages
>
> change from v1 to v2
> * introduce two functions for adjusting vruntime and load when attaching
> and detaching.
> * call the introduced functions instead of switched_from(to)_fair() directly
> in task_move_group_fair().
> * add decaying logic for a se which has detached from a cfs_rq.
>
> thanks,
> byungchul
>
> Byungchul Park (5):
> sched: add two functions adjusting cfs_rq's load when att(det)aching
> a se
> sched: make task_move_group_fair adjust cfs_rq's load in case of
> queued
> sched: sync a se with prev cfs_rq when changing cgroup
> sched: sync a se with its cfs_rq when switching sched class to fair
> class
> sched: add two functions for att(det)aching a task to(from) a cfs_rq
>
> kernel/sched/fair.c | 209 +++++++++++++++++++++++++++------------------------
> 1 file changed, 110 insertions(+), 99 deletions(-)
>
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Wed, Aug 19, 2015 at 03:47:15PM +0900, [email protected] wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 1be042a..3419f6c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2711,6 +2711,17 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
>
> static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> {
> + /*
> + * in case of migration and cgroup-change, more care should be taken
> + * because se's cfs_rq was changed, that means calling __update_load_avg
> + * with new cfs_rq->avg.last_update_time is meaningless. so we skip the
> + * update here. we have to update it with prev cfs_rq just before changing
> + * se's cfs_rq, and get here soon.
> + */
> + if (se->avg.last_update_time)
> + __update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> + &se->avg, 0, 0, NULL);
> +
> se->avg.last_update_time = cfs_rq->avg.last_update_time;
> cfs_rq->avg.load_avg += se->avg.load_avg;
> cfs_rq->avg.load_sum += se->avg.load_sum;
you seem to have forgotten to remove the same logic from
enqueue_entity_load_avg(), which will now call __update_load_avg()
twice.
On Wed, Aug 19, 2015 at 07:12:41PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 19, 2015 at 03:47:15PM +0900, [email protected] wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 1be042a..3419f6c 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -2711,6 +2711,17 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
> >
> > static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > {
> > + /*
> > + * in case of migration and cgroup-change, more care should be taken
> > + * because se's cfs_rq was changed, that means calling __update_load_avg
> > + * with new cfs_rq->avg.last_update_time is meaningless. so we skip the
> > + * update here. we have to update it with prev cfs_rq just before changing
> > + * se's cfs_rq, and get here soon.
> > + */
> > + if (se->avg.last_update_time)
> > + __update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> > + &se->avg, 0, 0, NULL);
> > +
> > se->avg.last_update_time = cfs_rq->avg.last_update_time;
> > cfs_rq->avg.load_avg += se->avg.load_avg;
> > cfs_rq->avg.load_sum += se->avg.load_sum;
>
> you seem to have forgotten to remove the same logic from
> enqueue_entity_load_avg(), which will now call __update_load_avg()
> twice.
In case of enqueue_entity_load_avg(), that seems to be ok.
However, the problem is that he made it "entangled":
In enqueue_entity_load_avg():
if (migrated)
attach_entity_load_avg();
while in attach_entity_load_avg():
if (!migrated)
__update_load_avg();
so, if attach() is called from enqueue(), that if() is never true.
To Byungchul,
1) I suggest you not entangle the entire series by mixing problem
sovling with code manipulating. That said, it is better you
first solve the "move between task group" problem and the
switch_to/from problem (if it is a problem, either way, comment
with your explanation to how you deal with the lost record and why).
2) After that, make the code cleaner, without change to logic, especially
avoid entangling the logic in order to do the code manipulation.
3) If you don't hate upper case letter, use it properly.
If it helps.
Thanks,
Yuyang
I did something like this on top.. please have a look at the XXX and
integrate.
---
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2664,8 +2664,8 @@ static inline u64 cfs_rq_clock_task(stru
/* Group cfs_rq's load_avg is used for task_h_load and update_cfs_share */
static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
{
- int decayed;
struct sched_avg *sa = &cfs_rq->avg;
+ int decayed;
if (atomic_long_read(&cfs_rq->removed_load_avg)) {
long r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
@@ -2695,15 +2695,16 @@ static inline int update_cfs_rq_load_avg
static inline void update_load_avg(struct sched_entity *se, int update_tg)
{
struct cfs_rq *cfs_rq = cfs_rq_of(se);
- int cpu = cpu_of(rq_of(cfs_rq));
u64 now = cfs_rq_clock_task(cfs_rq);
+ int cpu = cpu_of(rq_of(cfs_rq));
/*
* Track task load average for carrying it to new CPU after migrated, and
* track group sched_entity load average for task_h_load calc in migration
*/
__update_load_avg(now, cpu, &se->avg,
- se->on_rq * scale_load_down(se->load.weight), cfs_rq->curr == se, NULL);
+ se->on_rq * scale_load_down(se->load.weight),
+ cfs_rq->curr == se, NULL);
if (update_cfs_rq_load_avg(now, cfs_rq) && update_tg)
update_tg_load_avg(cfs_rq, 0);
@@ -2718,9 +2719,10 @@ static void attach_entity_load_avg(struc
* update here. we have to update it with prev cfs_rq just before changing
* se's cfs_rq, and get here soon.
*/
- if (se->avg.last_update_time)
+ if (se->avg.last_update_time) {
__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
- &se->avg, 0, 0, NULL);
+ &se->avg, 0, 0, NULL);
+ }
se->avg.last_update_time = cfs_rq->avg.last_update_time;
cfs_rq->avg.load_avg += se->avg.load_avg;
@@ -2732,17 +2734,13 @@ static void attach_entity_load_avg(struc
static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
- &se->avg, se->on_rq * scale_load_down(se->load.weight),
- cfs_rq->curr == se, NULL);
+ &se->avg, se->on_rq * scale_load_down(se->load.weight),
+ cfs_rq->curr == se, NULL);
- cfs_rq->avg.load_avg =
- max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
- cfs_rq->avg.load_sum =
- max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
- cfs_rq->avg.util_avg =
- max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
- cfs_rq->avg.util_sum =
- max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
+ cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
+ cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
+ cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
+ cfs_rq->avg.util_sum = max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
}
/* Add the load generated by se into cfs_rq's load average */
@@ -2751,14 +2749,14 @@ enqueue_entity_load_avg(struct cfs_rq *c
{
struct sched_avg *sa = &se->avg;
u64 now = cfs_rq_clock_task(cfs_rq);
- int migrated = 0, decayed;
+ int migrated, decayed;
- if (sa->last_update_time == 0)
- migrated = 1;
- else
+ migrated = !sa->last_update_time;
+ if (!migrated) {
__update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
se->on_rq * scale_load_down(se->load.weight),
cfs_rq->curr == se, NULL);
+ }
decayed = update_cfs_rq_load_avg(now, cfs_rq);
@@ -2781,7 +2779,7 @@ dequeue_entity_load_avg(struct cfs_rq *c
cfs_rq->runnable_load_avg =
max_t(long, cfs_rq->runnable_load_avg - se->avg.load_avg, 0);
cfs_rq->runnable_load_sum =
- max_t(s64, cfs_rq->runnable_load_sum - se->avg.load_sum, 0);
+ max_t(s64, cfs_rq->runnable_load_sum - se->avg.load_sum, 0);
}
/*
@@ -2849,6 +2847,11 @@ static inline void
dequeue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
static inline void remove_entity_load_avg(struct sched_entity *se) {}
+static inline void
+attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
+static inline void
+detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
+
static inline int idle_balance(struct rq *rq)
{
return 0;
@@ -7915,29 +7918,50 @@ prio_changed_fair(struct rq *rq, struct
check_preempt_curr(rq, p, 0);
}
-static void detach_task_cfs_rq(struct task_struct *p, int queued)
+static inline bool vruntime_normalized(struct task_struct *p)
{
+ int queued = task_on_rq_queued(p);
struct sched_entity *se = &p->se;
- struct cfs_rq *cfs_rq = cfs_rq_of(se);
/*
- * Ensure the task's vruntime is normalized, so that after attaching
- * back it to a cfs_rq the enqueue_entity() will do the right thing.
- *
* If it's queued, then the dequeue_entity(.flags=0) will already
- * have normalized the vruntime, if it's !queued, then only when
- * the task is sleeping it still has a non-normalized vruntime.
- *
+ * have normalized the vruntime.
+ */
+ if (queued)
+ return true;
+
+ /*
* When !queued, vruntime of the task has usually NOT been normalized.
* But there are some cases where it has already been normalized:
- *
- * - Moving a forked child which is waiting for being woken up by
+ */
+
+ /*
+ * - a forked child which is waiting for being woken up by
* wake_up_new_task().
- * - Moving a task which has been woken up by try_to_wake_up() and
+ */
+ if (!se->sum_exec_runtime || p->state == TASK_WAKING)
+ return true;
+
+ /*
+ * - a task which has been woken up by try_to_wake_up() and
* waiting for actually being woken up by sched_ttwu_pending().
+ *
+ * XXX this appears wrong!! check history,
+ * we appear to always set queued and RUNNING under the same lock instance
+ * might be from before TASK_WAKING ?
*/
- if (!queued && p->state != TASK_RUNNING &&
- se->sum_exec_runtime && p->state != TASK_WAKING) {
+ if (p->state == TASK_RUNNING)
+ return true;
+
+ return false;
+}
+
+static void detach_task_cfs_rq(struct task_struct *p)
+{
+ struct sched_entity *se = &p->se;
+ struct cfs_rq *cfs_rq = cfs_rq_of(se);
+
+ if (!vruntime_normalized(p)) {
/*
* Fix up our vruntime so that the current sleep doesn't
* cause 'unlimited' sleep bonus.
@@ -7946,10 +7970,8 @@ static void detach_task_cfs_rq(struct ta
se->vruntime -= cfs_rq->min_vruntime;
}
-#ifdef CONFIG_SMP
/* Catch up with the cfs_rq and remove our load when we leave */
detach_entity_load_avg(cfs_rq, se);
-#endif
}
static void attach_task_cfs_rq(struct task_struct *p, int queued)
@@ -7965,49 +7987,23 @@ static void attach_task_cfs_rq(struct ta
se->depth = se->parent ? se->parent->depth + 1 : 0;
#endif
-#ifdef CONFIG_SMP
/* synchronize task with its cfs_rq */
attach_entity_load_avg(cfs_rq, se);
-#endif
- /*
- * Ensure the task has a non-normalized vruntime when attaching it
- * to a cfs_rq, so that enqueue_entity() at wake-up time will do
- * the right thing.
- *
- * If it's queued, then the enqueue_entity(.flags=0) makes the task
- * has non-normalized vruntime, if it's !queued, then it still has
- * a normalized vruntime.
- *
- * When !queued, a task usually should have a non-normalized vruntime
- * after being attached to a cfs_rq. But there are some cases where
- * we should keep it normalized:
- *
- * - Moving a forked child which is waiting for being woken up by
- * wake_up_new_task().
- * - Moving a task which has been woken up by try_to_wake_up() and
- * waiting for actually being woken up by sched_ttwu_pending().
- */
- if (!queued && p->state != TASK_RUNNING &&
- se->sum_exec_runtime && p->state != TASK_WAKING)
+ if (!vruntime_normalized(p))
se->vruntime += cfs_rq->min_vruntime;
}
static void switched_from_fair(struct rq *rq, struct task_struct *p)
{
- detach_task_cfs_rq(p, task_on_rq_queued(p));
+ detach_task_cfs_rq(p);
}
-/*
- * We switched to the sched_fair class.
- */
static void switched_to_fair(struct rq *rq, struct task_struct *p)
{
- int queued = task_on_rq_queued(p);
-
- attach_task_cfs_rq(p, queued);
+ attach_task_cfs_rq(p);
- if (queued) {
+ if (task_on_rq_queued(p)) {
/*
* We were most likely switched from sched_rt, so
* kick off the schedule if running, otherwise just see
@@ -8056,8 +8052,7 @@ static void task_move_group_fair(struct
{
detach_task_cfs_rq(p, queued);
set_task_rq(p, task_cpu(p));
-
- /* tell se's cfs_rq has been changed */
+ /* tell se's cfs_rq has been changed -- migrated */
p->se.avg.last_update_time = 0;
attach_task_cfs_rq(p, queued);
}
On Wed, Aug 19, 2015 at 07:12:41PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 19, 2015 at 03:47:15PM +0900, [email protected] wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 1be042a..3419f6c 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -2711,6 +2711,17 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
> >
> > static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > {
> > + /*
> > + * in case of migration and cgroup-change, more care should be taken
> > + * because se's cfs_rq was changed, that means calling __update_load_avg
> > + * with new cfs_rq->avg.last_update_time is meaningless. so we skip the
> > + * update here. we have to update it with prev cfs_rq just before changing
> > + * se's cfs_rq, and get here soon.
> > + */
> > + if (se->avg.last_update_time)
> > + __update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> > + &se->avg, 0, 0, NULL);
> > +
> > se->avg.last_update_time = cfs_rq->avg.last_update_time;
> > cfs_rq->avg.load_avg += se->avg.load_avg;
> > cfs_rq->avg.load_sum += se->avg.load_sum;
>
> you seem to have forgotten to remove the same logic from
> enqueue_entity_load_avg(), which will now call __update_load_avg()
> twice.
i am sorry for not catching what you want to say. enqueue_entity_load_avg()
calls __update_load_avg() in normal case, that is, not having changed its
cfs_rq, while in case having changed its cfs_rq, we only need to attach the
average load assuming that __update_load_avg() was already called when its
cfs_rq was changed.
is there anything i missed? could you let me know if yes?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Thu, Aug 20, 2015 at 03:17:21AM +0200, Peter Zijlstra wrote:
>
> I did something like this on top.. please have a look at the XXX and
> integrate.
yes, i will do it.
>
> ---
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2664,8 +2664,8 @@ static inline u64 cfs_rq_clock_task(stru
> /* Group cfs_rq's load_avg is used for task_h_load and update_cfs_share */
> static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
> {
> - int decayed;
> struct sched_avg *sa = &cfs_rq->avg;
> + int decayed;
>
> if (atomic_long_read(&cfs_rq->removed_load_avg)) {
> long r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
> @@ -2695,15 +2695,16 @@ static inline int update_cfs_rq_load_avg
> static inline void update_load_avg(struct sched_entity *se, int update_tg)
> {
> struct cfs_rq *cfs_rq = cfs_rq_of(se);
> - int cpu = cpu_of(rq_of(cfs_rq));
> u64 now = cfs_rq_clock_task(cfs_rq);
> + int cpu = cpu_of(rq_of(cfs_rq));
>
> /*
> * Track task load average for carrying it to new CPU after migrated, and
> * track group sched_entity load average for task_h_load calc in migration
> */
> __update_load_avg(now, cpu, &se->avg,
> - se->on_rq * scale_load_down(se->load.weight), cfs_rq->curr == se, NULL);
> + se->on_rq * scale_load_down(se->load.weight),
> + cfs_rq->curr == se, NULL);
>
> if (update_cfs_rq_load_avg(now, cfs_rq) && update_tg)
> update_tg_load_avg(cfs_rq, 0);
> @@ -2718,9 +2719,10 @@ static void attach_entity_load_avg(struc
> * update here. we have to update it with prev cfs_rq just before changing
> * se's cfs_rq, and get here soon.
> */
> - if (se->avg.last_update_time)
> + if (se->avg.last_update_time) {
> __update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> - &se->avg, 0, 0, NULL);
> + &se->avg, 0, 0, NULL);
> + }
>
> se->avg.last_update_time = cfs_rq->avg.last_update_time;
> cfs_rq->avg.load_avg += se->avg.load_avg;
> @@ -2732,17 +2734,13 @@ static void attach_entity_load_avg(struc
> static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> {
> __update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> - &se->avg, se->on_rq * scale_load_down(se->load.weight),
> - cfs_rq->curr == se, NULL);
> + &se->avg, se->on_rq * scale_load_down(se->load.weight),
> + cfs_rq->curr == se, NULL);
>
> - cfs_rq->avg.load_avg =
> - max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
> - cfs_rq->avg.load_sum =
> - max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
> - cfs_rq->avg.util_avg =
> - max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
> - cfs_rq->avg.util_sum =
> - max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
> + cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
> + cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
> + cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
> + cfs_rq->avg.util_sum = max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
> }
>
> /* Add the load generated by se into cfs_rq's load average */
> @@ -2751,14 +2749,14 @@ enqueue_entity_load_avg(struct cfs_rq *c
> {
> struct sched_avg *sa = &se->avg;
> u64 now = cfs_rq_clock_task(cfs_rq);
> - int migrated = 0, decayed;
> + int migrated, decayed;
>
> - if (sa->last_update_time == 0)
> - migrated = 1;
> - else
> + migrated = !sa->last_update_time;
> + if (!migrated) {
> __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
> se->on_rq * scale_load_down(se->load.weight),
> cfs_rq->curr == se, NULL);
> + }
>
> decayed = update_cfs_rq_load_avg(now, cfs_rq);
>
> @@ -2781,7 +2779,7 @@ dequeue_entity_load_avg(struct cfs_rq *c
> cfs_rq->runnable_load_avg =
> max_t(long, cfs_rq->runnable_load_avg - se->avg.load_avg, 0);
> cfs_rq->runnable_load_sum =
> - max_t(s64, cfs_rq->runnable_load_sum - se->avg.load_sum, 0);
> + max_t(s64, cfs_rq->runnable_load_sum - se->avg.load_sum, 0);
> }
>
> /*
> @@ -2849,6 +2847,11 @@ static inline void
> dequeue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
> static inline void remove_entity_load_avg(struct sched_entity *se) {}
>
> +static inline void
> +attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
> +static inline void
> +detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
> +
> static inline int idle_balance(struct rq *rq)
> {
> return 0;
> @@ -7915,29 +7918,50 @@ prio_changed_fair(struct rq *rq, struct
> check_preempt_curr(rq, p, 0);
> }
>
> -static void detach_task_cfs_rq(struct task_struct *p, int queued)
> +static inline bool vruntime_normalized(struct task_struct *p)
> {
> + int queued = task_on_rq_queued(p);
> struct sched_entity *se = &p->se;
> - struct cfs_rq *cfs_rq = cfs_rq_of(se);
>
> /*
> - * Ensure the task's vruntime is normalized, so that after attaching
> - * back it to a cfs_rq the enqueue_entity() will do the right thing.
> - *
> * If it's queued, then the dequeue_entity(.flags=0) will already
> - * have normalized the vruntime, if it's !queued, then only when
> - * the task is sleeping it still has a non-normalized vruntime.
> - *
> + * have normalized the vruntime.
> + */
> + if (queued)
> + return true;
> +
> + /*
> * When !queued, vruntime of the task has usually NOT been normalized.
> * But there are some cases where it has already been normalized:
> - *
> - * - Moving a forked child which is waiting for being woken up by
> + */
> +
> + /*
> + * - a forked child which is waiting for being woken up by
> * wake_up_new_task().
> - * - Moving a task which has been woken up by try_to_wake_up() and
> + */
> + if (!se->sum_exec_runtime || p->state == TASK_WAKING)
> + return true;
> +
> + /*
> + * - a task which has been woken up by try_to_wake_up() and
> * waiting for actually being woken up by sched_ttwu_pending().
> + *
> + * XXX this appears wrong!! check history,
> + * we appear to always set queued and RUNNING under the same lock instance
> + * might be from before TASK_WAKING ?
> */
> - if (!queued && p->state != TASK_RUNNING &&
> - se->sum_exec_runtime && p->state != TASK_WAKING) {
> + if (p->state == TASK_RUNNING)
> + return true;
> +
> + return false;
> +}
> +
> +static void detach_task_cfs_rq(struct task_struct *p)
> +{
> + struct sched_entity *se = &p->se;
> + struct cfs_rq *cfs_rq = cfs_rq_of(se);
> +
> + if (!vruntime_normalized(p)) {
> /*
> * Fix up our vruntime so that the current sleep doesn't
> * cause 'unlimited' sleep bonus.
> @@ -7946,10 +7970,8 @@ static void detach_task_cfs_rq(struct ta
> se->vruntime -= cfs_rq->min_vruntime;
> }
>
> -#ifdef CONFIG_SMP
> /* Catch up with the cfs_rq and remove our load when we leave */
> detach_entity_load_avg(cfs_rq, se);
> -#endif
> }
>
> static void attach_task_cfs_rq(struct task_struct *p, int queued)
> @@ -7965,49 +7987,23 @@ static void attach_task_cfs_rq(struct ta
> se->depth = se->parent ? se->parent->depth + 1 : 0;
> #endif
>
> -#ifdef CONFIG_SMP
> /* synchronize task with its cfs_rq */
> attach_entity_load_avg(cfs_rq, se);
> -#endif
>
> - /*
> - * Ensure the task has a non-normalized vruntime when attaching it
> - * to a cfs_rq, so that enqueue_entity() at wake-up time will do
> - * the right thing.
> - *
> - * If it's queued, then the enqueue_entity(.flags=0) makes the task
> - * has non-normalized vruntime, if it's !queued, then it still has
> - * a normalized vruntime.
> - *
> - * When !queued, a task usually should have a non-normalized vruntime
> - * after being attached to a cfs_rq. But there are some cases where
> - * we should keep it normalized:
> - *
> - * - Moving a forked child which is waiting for being woken up by
> - * wake_up_new_task().
> - * - Moving a task which has been woken up by try_to_wake_up() and
> - * waiting for actually being woken up by sched_ttwu_pending().
> - */
> - if (!queued && p->state != TASK_RUNNING &&
> - se->sum_exec_runtime && p->state != TASK_WAKING)
> + if (!vruntime_normalized(p))
> se->vruntime += cfs_rq->min_vruntime;
> }
>
> static void switched_from_fair(struct rq *rq, struct task_struct *p)
> {
> - detach_task_cfs_rq(p, task_on_rq_queued(p));
> + detach_task_cfs_rq(p);
> }
>
> -/*
> - * We switched to the sched_fair class.
> - */
> static void switched_to_fair(struct rq *rq, struct task_struct *p)
> {
> - int queued = task_on_rq_queued(p);
> -
> - attach_task_cfs_rq(p, queued);
> + attach_task_cfs_rq(p);
>
> - if (queued) {
> + if (task_on_rq_queued(p)) {
> /*
> * We were most likely switched from sched_rt, so
> * kick off the schedule if running, otherwise just see
> @@ -8056,8 +8052,7 @@ static void task_move_group_fair(struct
> {
> detach_task_cfs_rq(p, queued);
> set_task_rq(p, task_cpu(p));
> -
> - /* tell se's cfs_rq has been changed */
> + /* tell se's cfs_rq has been changed -- migrated */
> p->se.avg.last_update_time = 0;
> attach_task_cfs_rq(p, queued);
> }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Thu, Aug 20, 2015 at 04:11:06AM +0800, Yuyang Du wrote:
> On Wed, Aug 19, 2015 at 07:12:41PM +0200, Peter Zijlstra wrote:
> > On Wed, Aug 19, 2015 at 03:47:15PM +0900, [email protected] wrote:
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 1be042a..3419f6c 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -2711,6 +2711,17 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
> > >
> > > static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > > {
> > > + /*
> > > + * in case of migration and cgroup-change, more care should be taken
> > > + * because se's cfs_rq was changed, that means calling __update_load_avg
> > > + * with new cfs_rq->avg.last_update_time is meaningless. so we skip the
> > > + * update here. we have to update it with prev cfs_rq just before changing
> > > + * se's cfs_rq, and get here soon.
> > > + */
> > > + if (se->avg.last_update_time)
> > > + __update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> > > + &se->avg, 0, 0, NULL);
> > > +
> > > se->avg.last_update_time = cfs_rq->avg.last_update_time;
> > > cfs_rq->avg.load_avg += se->avg.load_avg;
> > > cfs_rq->avg.load_sum += se->avg.load_sum;
> >
> > you seem to have forgotten to remove the same logic from
> > enqueue_entity_load_avg(), which will now call __update_load_avg()
> > twice.
>
> In case of enqueue_entity_load_avg(), that seems to be ok.
>
> However, the problem is that he made it "entangled":
>
> In enqueue_entity_load_avg():
>
> if (migrated)
> attach_entity_load_avg();
>
> while in attach_entity_load_avg():
>
> if (!migrated)
> __update_load_avg();
>
> so, if attach() is called from enqueue(), that if() is never true.
in case of migration, we should not call __update_load_avg() when attaching,
because se->avg.last_update_time is not meaningless on changed cfs_rq. that
is why attach_entity_load_avg() conditionally calls __update_load_avg().
enqueue() calls attach() only when migrating, so the if() should be never
true. i think it is normal. is it a problem?
>
> To Byungchul,
>
> 1) I suggest you not entangle the entire series by mixing problem
i think i didn't mix any problems. i solved each problem indivisually.
> sovling with code manipulating. That said, it is better you
> first solve the "move between task group" problem and the
> switch_to/from problem (if it is a problem, either way, comment
patch 2, 3 solve the former, and patch 4 solves the latter.
> with your explanation to how you deal with the lost record and why).
how i deal with the lost record, is the "decaying" as you can see easily.
i think it is the most natural way to deal with it.
> 2) After that, make the code cleaner, without change to logic, especially
> avoid entangling the logic in order to do the code manipulation.
i think i should modify patches if i entangled code, and could you tell me
where i did it?
> 3) If you don't hate upper case letter, use it properly.
i will try it.
thanks,
byungchul
>
> If it helps.
>
> Thanks,
> Yuyang
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Thu, Aug 20, 2015 at 03:17:21AM +0200, Peter Zijlstra wrote:
>
> I did something like this on top.. please have a look at the XXX and
> integrate.
i am not sure, what do you intend for me to do.
do you mean that i am supposed to integrate this cleanup patch you gave me
including the XXX comment?
> + *
> + * XXX this appears wrong!! check history,
> + * we appear to always set queued and RUNNING under the same lock instance
> + * might be from before TASK_WAKING ?
> */
is it impossible to happen to check if vruntime is normalized, when doing
something like e.g. active load balance where queued != TASK_ON_RQ_QUEUED
and p->state == TASK_RUNNING?
i think it can happen..
thanks,
byungchul
On Thu, Aug 20, 2015 at 05:38:41PM +0900, Byungchul Park wrote:
> On Thu, Aug 20, 2015 at 03:17:21AM +0200, Peter Zijlstra wrote:
> >
> > I did something like this on top.. please have a look at the XXX and
> > integrate.
>
> i am not sure, what do you intend for me to do.
>
> do you mean that i am supposed to integrate this cleanup patch you gave me
> including the XXX comment?
>
> > + *
> > + * XXX this appears wrong!! check history,
> > + * we appear to always set queued and RUNNING under the same lock instance
> > + * might be from before TASK_WAKING ?
> > */
>
> is it impossible to happen to check if vruntime is normalized, when doing
> something like e.g. active load balance where queued != TASK_ON_RQ_QUEUED
> and p->state == TASK_RUNNING?
furthermore, in any migration by load balance, it seems to be possible..
>
> i think it can happen..
>
> thanks,
> byungchul
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Thu, Aug 20, 2015 at 07:46:09PM +0900, Byungchul Park wrote:
> On Thu, Aug 20, 2015 at 05:38:41PM +0900, Byungchul Park wrote:
> > On Thu, Aug 20, 2015 at 03:17:21AM +0200, Peter Zijlstra wrote:
> > >
> > > I did something like this on top.. please have a look at the XXX and
> > > integrate.
> >
> > i am not sure, what do you intend for me to do.
> >
> > do you mean that i am supposed to integrate this cleanup patch you gave me
> > including the XXX comment?
No, the intent was for you to think about the point marked XXX, which
you've done below.
> > > + *
> > > + * XXX this appears wrong!! check history,
> > > + * we appear to always set queued and RUNNING under the same lock instance
> > > + * might be from before TASK_WAKING ?
> > > */
> >
> > is it impossible to happen to check if vruntime is normalized, when doing
> > something like e.g. active load balance where queued != TASK_ON_RQ_QUEUED
> > and p->state == TASK_RUNNING?
>
> furthermore, in any migration by load balance, it seems to be possible..
>
> >
> > i think it can happen..
OK, then we need to change the comment to reflect the actual reason the
test is needed. Because I think the currently described scenario is
incorrect.
On Thu, Aug 20, 2015 at 04:11:06AM +0800, Yuyang Du wrote:
> On Wed, Aug 19, 2015 at 07:12:41PM +0200, Peter Zijlstra wrote:
> > On Wed, Aug 19, 2015 at 03:47:15PM +0900, [email protected] wrote:
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 1be042a..3419f6c 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -2711,6 +2711,17 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
> > >
> > > static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > > {
> > > + /*
> > > + * in case of migration and cgroup-change, more care should be taken
> > > + * because se's cfs_rq was changed, that means calling __update_load_avg
> > > + * with new cfs_rq->avg.last_update_time is meaningless. so we skip the
> > > + * update here. we have to update it with prev cfs_rq just before changing
> > > + * se's cfs_rq, and get here soon.
> > > + */
> > > + if (se->avg.last_update_time)
> > > + __update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> > > + &se->avg, 0, 0, NULL);
> > > +
> > > se->avg.last_update_time = cfs_rq->avg.last_update_time;
> > > cfs_rq->avg.load_avg += se->avg.load_avg;
> > > cfs_rq->avg.load_sum += se->avg.load_sum;
> >
> > you seem to have forgotten to remove the same logic from
> > enqueue_entity_load_avg(), which will now call __update_load_avg()
> > twice.
>
> In case of enqueue_entity_load_avg(), that seems to be ok.
>
> However, the problem is that he made it "entangled":
>
> In enqueue_entity_load_avg():
>
> if (migrated)
> attach_entity_load_avg();
>
> while in attach_entity_load_avg():
>
> if (!migrated)
> __update_load_avg();
>
> so, if attach() is called from enqueue(), that if() is never true.
Right, I noticed the same yesterday when I took a second look at that
stuff. It was a little confusing indeed.
On Thu, Aug 20, 2015 at 11:13:23PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 20, 2015 at 04:11:06AM +0800, Yuyang Du wrote:
> > On Wed, Aug 19, 2015 at 07:12:41PM +0200, Peter Zijlstra wrote:
> > > On Wed, Aug 19, 2015 at 03:47:15PM +0900, [email protected] wrote:
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index 1be042a..3419f6c 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -2711,6 +2711,17 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
> > > >
> > > > static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > > > {
> > > > + /*
> > > > + * in case of migration and cgroup-change, more care should be taken
> > > > + * because se's cfs_rq was changed, that means calling __update_load_avg
> > > > + * with new cfs_rq->avg.last_update_time is meaningless. so we skip the
> > > > + * update here. we have to update it with prev cfs_rq just before changing
> > > > + * se's cfs_rq, and get here soon.
> > > > + */
> > > > + if (se->avg.last_update_time)
> > > > + __update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> > > > + &se->avg, 0, 0, NULL);
> > > > +
> > > > se->avg.last_update_time = cfs_rq->avg.last_update_time;
> > > > cfs_rq->avg.load_avg += se->avg.load_avg;
> > > > cfs_rq->avg.load_sum += se->avg.load_sum;
> > >
> > > you seem to have forgotten to remove the same logic from
> > > enqueue_entity_load_avg(), which will now call __update_load_avg()
> > > twice.
> >
> > In case of enqueue_entity_load_avg(), that seems to be ok.
> >
> > However, the problem is that he made it "entangled":
> >
> > In enqueue_entity_load_avg():
> >
> > if (migrated)
> > attach_entity_load_avg();
> >
> > while in attach_entity_load_avg():
> >
> > if (!migrated)
> > __update_load_avg();
> >
> > so, if attach() is called from enqueue(), that if() is never true.
>
> Right, I noticed the same yesterday when I took a second look at that
> stuff. It was a little confusing indeed.
don't couple functions to each other between functions. i believe it's
acceptable if you look at each function itself.
in other words, attach_entity_load_avg() must call __update_load_avg()
with the condition, whoever calls attach_entity_load_avg(). and
enqueue_entity_load_avg(), too.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Thu, Aug 20, 2015 at 11:11:31PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 20, 2015 at 07:46:09PM +0900, Byungchul Park wrote:
> > On Thu, Aug 20, 2015 at 05:38:41PM +0900, Byungchul Park wrote:
> > > On Thu, Aug 20, 2015 at 03:17:21AM +0200, Peter Zijlstra wrote:
> > > >
> > > > I did something like this on top.. please have a look at the XXX and
> > > > integrate.
> > >
> > > i am not sure, what do you intend for me to do.
> > >
> > > do you mean that i am supposed to integrate this cleanup patch you gave me
> > > including the XXX comment?
>
> No, the intent was for you to think about the point marked XXX, which
> you've done below.
>
> > > > + *
> > > > + * XXX this appears wrong!! check history,
> > > > + * we appear to always set queued and RUNNING under the same lock instance
> > > > + * might be from before TASK_WAKING ?
> > > > */
> > >
> > > is it impossible to happen to check if vruntime is normalized, when doing
> > > something like e.g. active load balance where queued != TASK_ON_RQ_QUEUED
> > > and p->state == TASK_RUNNING?
> >
> > furthermore, in any migration by load balance, it seems to be possible..
> >
> > >
> > > i think it can happen..
>
> OK, then we need to change the comment to reflect the actual reason the
> test is needed. Because I think the currently described scenario is
> incorrect.
what is the currently described scenario to need to change?
then.. did i change patches as what you suggested, in v4?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/