When a runqueue runs out of RT tasks, it may have non-RT tasks or
none tasks(idle). Currently, RT balance treats the two cases equally
and manipulates cpupri.pri_to_cpu[CPUPRI_NORMAL] only which may cause
problems.
For instance, 4 cpus system, non-RT task1 is running on cpu0, RT
task2 is running on cpu3, cpu1/cpu2 both are idle. Then RT task3
(usually CPU-intensive) is waken up or created on cpu3, it will
be placed to cpu0 (see find_lowest_rq()) causing task1 starving
until cfs load balance places task1 to another cpu, or even worse
if task1 is bound on cpu0. So, it would be reasonable to put task3
to cpu1 or cpu2 which is idle(even though doing this may break the
energy-saving idle state).
This patch tackles the problem by operating pri_to_cpu[CPUPRI_IDLE]
of cpupri according to the stages of idle task, so that when pushing
or selecting RT tasks through find_lowest_rq(), it will try to find
one idle cpu as the goal.
Signed-off-by: pang.xunlei <[email protected]>
---
kernel/sched/idle_task.c | 3 +++
kernel/sched/rt.c | 21 +++++++++++++++++++++
kernel/sched/sched.h | 6 ++++++
3 files changed, 30 insertions(+)
diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
index 67ad4e7..e053347 100644
--- a/kernel/sched/idle_task.c
+++ b/kernel/sched/idle_task.c
@@ -26,6 +26,8 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl
static struct task_struct *
pick_next_task_idle(struct rq *rq, struct task_struct *prev)
{
+ idle_enter_rt(rq);
+
put_prev_task(rq, prev);
schedstat_inc(rq, sched_goidle);
@@ -47,6 +49,7 @@ dequeue_task_idle(struct rq *rq, struct task_struct *p, int flags)
static void put_prev_task_idle(struct rq *rq, struct task_struct *prev)
{
+ idle_exit_rt(rq);
idle_exit_fair(rq);
rq_last_tick_reset(rq);
}
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index d024e6c..da6922e 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -992,6 +992,27 @@ enqueue_top_rt_rq(struct rt_rq *rt_rq)
#if defined CONFIG_SMP
+/* Set CPUPRI_IDLE bitmap for this cpu when entering idle. */
+void idle_enter_rt(struct rq *this_rq)
+{
+ struct cpupri *cp = &this_rq->rd->cpupri;
+ int currpri = cp->cpu_to_pri[this_rq->cpu];
+
+ BUG_ON(currpri != CPUPRI_NORMAL);
+ cpupri_set(cp, this_rq->cpu, MAX_PRIO);
+}
+
+/* Set CPUPRI_NORMAL bitmap for this cpu when exiting from idle. */
+void idle_exit_rt(struct rq *this_rq)
+{
+ struct cpupri *cp = &this_rq->rd->cpupri;
+ int currpri = cp->cpu_to_pri[this_rq->cpu];
+
+ /* RT tasks may be queued before, this judgement is needed. */
+ if (currpri == CPUPRI_IDLE)
+ cpupri_set(cp, this_rq->cpu, MAX_RT_PRIO);
+}
+
static void
inc_rt_prio_smp(struct rt_rq *rt_rq, int prio, int prev_prio)
{
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 24156c84..cc603fa 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1162,11 +1162,17 @@ extern void update_group_capacity(struct sched_domain *sd, int cpu);
extern void trigger_load_balance(struct rq *rq);
+extern void idle_enter_rt(struct rq *this_rq);
+extern void idle_exit_rt(struct rq *this_rq);
+
extern void idle_enter_fair(struct rq *this_rq);
extern void idle_exit_fair(struct rq *this_rq);
#else
+static inline void idle_enter_rt(struct rq *rq) { }
+static inline void idle_exit_rt(struct rq *rq) { }
+
static inline void idle_enter_fair(struct rq *rq) { }
static inline void idle_exit_fair(struct rq *rq) { }
--
1.7.9.5
When selecting the cpu for a waking RT task, if curr is a non-RT
task which is bound only on this cpu, then we can give it a chance
to select a different cpu(definitely an idle cpu if existing) for
the RT task to avoid curr starving.
Signed-off-by: pang.xunlei <[email protected]>
---
kernel/sched/rt.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index da6922e..dc1f7f0 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1340,6 +1340,11 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
* runqueue. Otherwise simply start this RT task
* on its current runqueue.
*
+ * If the current task on @p's runqueue is a non-RT task,
+ * and this task is bound on current runqueue, then try to
+ * see if we can wake this RT task up on a different runqueue,
+ * we will definitely find an idle cpu if there is any.
+ *
* We want to avoid overloading runqueues. If the woken
* task is a higher priority, then it will stay on this CPU
* and the lower prio task should be moved to another CPU.
@@ -1356,9 +1361,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
* This test is optimistic, if we get it wrong the load-balancer
* will have to sort it out.
*/
- if (curr && unlikely(rt_task(curr)) &&
- (curr->nr_cpus_allowed < 2 ||
- curr->prio <= p->prio)) {
+ if (curr && unlikely(curr->nr_cpus_allowed < 2 ||
+ curr->prio <= p->prio)) {
int target = find_lowest_rq(p);
if (target != -1)
--
1.7.9.5
When a runqueue runs out of DL tasks, it may have RT tasks or non-RT
tasks or just idle. It'd be better to put the DL task to an idle cpu
or non-RT cpu if there is any.
Adds idle_enter_dl()/idle_exit_dl() to detect idle cases.
Adds rt_enter_dl()/rt_exit_dl() to detect non-RT cases.
Use the same thought as tackling RT in the former patch.
Signed-off-by: pang.xunlei <[email protected]>
---
kernel/sched/cpudeadline.c | 79 +++++++++++++++++++++++++++++++++++---------
kernel/sched/cpudeadline.h | 13 ++++++--
kernel/sched/deadline.c | 32 +++++++++++++++---
kernel/sched/idle_task.c | 2 ++
kernel/sched/rt.c | 7 ++++
kernel/sched/sched.h | 11 ++++++
6 files changed, 121 insertions(+), 23 deletions(-)
diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 539ca3c..d5ebc34 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -106,10 +106,24 @@ int cpudl_find(struct cpudl *cp, struct task_struct *p,
{
int best_cpu = -1;
const struct sched_dl_entity *dl_se = &p->dl;
+ struct cpumask tmp_mask;
- if (later_mask && cpumask_and(later_mask, later_mask, cp->free_cpus)) {
- best_cpu = cpumask_any(later_mask);
- goto out;
+ if (later_mask) {
+ cpumask_and(&tmp_mask, &p->cpus_allowed, cpu_active_mask);
+ if (cpumask_and(later_mask, &tmp_mask, cp->idle_cpus)) {
+ best_cpu = cpumask_any(later_mask);
+ goto out;
+ }
+
+ if (cpumask_and(later_mask, &tmp_mask, cp->freert_cpus)) {
+ best_cpu = cpumask_any(later_mask);
+ goto out;
+ }
+
+ if (cpumask_and(later_mask, &tmp_mask, cp->freedl_cpus)) {
+ best_cpu = cpumask_any(later_mask);
+ goto out;
+ }
} else if (cpumask_test_cpu(cpudl_maximum(cp), &p->cpus_allowed) &&
dl_time_before(dl_se->deadline, cp->elements[0].dl)) {
best_cpu = cpudl_maximum(cp);
@@ -128,12 +142,12 @@ out:
* @cp: the cpudl max-heap context
* @cpu: the target cpu
* @dl: the new earliest deadline for this cpu
- *
+ * @set_flags: CPUDL_SET_XXX, CPUDL_CLEAR_XXX
* Notes: assumes cpu_rq(cpu)->lock is locked
*
* Returns: (void)
*/
-void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
+void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int set_flags)
{
int old_idx, new_cpu;
unsigned long flags;
@@ -141,8 +155,25 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
WARN_ON(!cpu_present(cpu));
raw_spin_lock_irqsave(&cp->lock, flags);
+ switch (set_flags) {
+ case CPUDL_SET_IDLE:
+ cpumask_set_cpu(cpu, cp->idle_cpus);
+ goto out;
+ case CPUDL_CLEAR_IDLE:
+ cpumask_clear_cpu(cpu, cp->idle_cpus);
+ goto out;
+ case CPUDL_SET_FREERT:
+ cpumask_set_cpu(cpu, cp->freert_cpus);
+ goto out;
+ case CPUDL_CLEAR_FREERT:
+ cpumask_set_cpu(cpu, cp->freert_cpus);
+ goto out;
+ default:
+ break;
+ }
+
old_idx = cp->elements[cpu].idx;
- if (!is_valid) {
+ if (set_flags == CPUDL_SET_FREEDL) {
/* remove item */
if (old_idx == IDX_INVALID) {
/*
@@ -164,8 +195,8 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
cpudl_exchange(cp, old_idx, parent(old_idx));
old_idx = parent(old_idx);
}
- cpumask_set_cpu(cpu, cp->free_cpus);
- cpudl_heapify(cp, old_idx);
+ cpumask_set_cpu(cpu, cp->freedl_cpus);
+ cpudl_heapify(cp, old_idx);
goto out;
}
@@ -176,7 +207,7 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
cp->elements[cp->size - 1].cpu = cpu;
cp->elements[cpu].idx = cp->size - 1;
cpudl_change_key(cp, cp->size - 1, dl);
- cpumask_clear_cpu(cpu, cp->free_cpus);
+ cpumask_clear_cpu(cpu, cp->freedl_cpus);
} else {
cpudl_change_key(cp, old_idx, dl);
}
@@ -201,19 +232,33 @@ int cpudl_init(struct cpudl *cp)
sizeof(struct cpudl_item),
GFP_KERNEL);
if (!cp->elements)
- return -ENOMEM;
+ goto out;
+
+ if (!alloc_cpumask_var(&cp->freedl_cpus, GFP_KERNEL))
+ goto free_elements;
+
+ if (!zalloc_cpumask_var(&cp->freert_cpus, GFP_KERNEL))
+ goto free_freedl_cpus;
+
+ if (!zalloc_cpumask_var(&cp->idle_cpus, GFP_KERNEL))
+ goto free_freert_cpus;
- if (!alloc_cpumask_var(&cp->free_cpus, GFP_KERNEL)) {
- kfree(cp->elements);
- return -ENOMEM;
- }
for_each_possible_cpu(i)
cp->elements[i].idx = IDX_INVALID;
- cpumask_setall(cp->free_cpus);
+ cpumask_setall(cp->freedl_cpus);
return 0;
+
+free_freert_cpus:
+ kfree(cp->freert_cpus);
+free_freedl_cpus:
+ kfree(cp->freedl_cpus);
+free_elements:
+ kfree(cp->elements);
+out:
+ return -ENOMEM;
}
/*
@@ -222,6 +267,8 @@ int cpudl_init(struct cpudl *cp)
*/
void cpudl_cleanup(struct cpudl *cp)
{
- free_cpumask_var(cp->free_cpus);
+ free_cpumask_var(cp->freedl_cpus);
+ free_cpumask_var(cp->freert_cpus);
+ free_cpumask_var(cp->idle_cpus);
kfree(cp->elements);
}
diff --git a/kernel/sched/cpudeadline.h b/kernel/sched/cpudeadline.h
index 538c979..d79e4d8 100644
--- a/kernel/sched/cpudeadline.h
+++ b/kernel/sched/cpudeadline.h
@@ -5,6 +5,13 @@
#define IDX_INVALID -1
+#define CPUDL_SET_DL 1 /* set deadline value, clear freedl_cpus */
+#define CPUDL_SET_FREEDL 2 /* set freedl_cpus */
+#define CPUDL_SET_FREERT 3 /* set freert_cpus */
+#define CPUDL_CLEAR_FREERT 4 /* clear freert_cpus */
+#define CPUDL_SET_IDLE 5 /* set idle_cpus */
+#define CPUDL_CLEAR_IDLE 6 /* clear idle_cpus */
+
struct cpudl_item {
u64 dl;
int cpu;
@@ -14,7 +21,9 @@ struct cpudl_item {
struct cpudl {
raw_spinlock_t lock;
int size;
- cpumask_var_t free_cpus;
+ cpumask_var_t idle_cpus;
+ cpumask_var_t freert_cpus;
+ cpumask_var_t freedl_cpus;
struct cpudl_item *elements;
};
@@ -22,7 +31,7 @@ struct cpudl {
#ifdef CONFIG_SMP
int cpudl_find(struct cpudl *cp, struct task_struct *p,
struct cpumask *later_mask);
-void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid);
+void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int set_flags);
int cpudl_init(struct cpudl *cp);
void cpudl_cleanup(struct cpudl *cp);
#else
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 5285332..7b0b2d2 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -673,6 +673,26 @@ static void update_curr_dl(struct rq *rq)
#ifdef CONFIG_SMP
+void idle_enter_dl(struct rq *this_rq)
+{
+ cpudl_set(&this_rq->rd->cpudl, this_rq->cpu, 0, CPUDL_SET_IDLE);
+}
+
+void idle_exit_dl(struct rq *this_rq)
+{
+ cpudl_set(&this_rq->rd->cpudl, this_rq->cpu, 0, CPUDL_CLEAR_IDLE);
+}
+
+void rt_enter_dl(struct rq *this_rq)
+{
+ cpudl_set(&this_rq->rd->cpudl, this_rq->cpu, 0, CPUDL_CLEAR_FREERT);
+}
+
+void rt_exit_dl(struct rq *this_rq)
+{
+ cpudl_set(&this_rq->rd->cpudl, this_rq->cpu, 0, CPUDL_SET_FREERT);
+}
+
static struct task_struct *pick_next_earliest_dl_task(struct rq *rq, int cpu);
static inline u64 next_deadline(struct rq *rq)
@@ -699,7 +719,7 @@ static void inc_dl_deadline(struct dl_rq *dl_rq, u64 deadline)
*/
dl_rq->earliest_dl.next = dl_rq->earliest_dl.curr;
dl_rq->earliest_dl.curr = deadline;
- cpudl_set(&rq->rd->cpudl, rq->cpu, deadline, 1);
+ cpudl_set(&rq->rd->cpudl, rq->cpu, deadline, CPUDL_SET_DL);
} else if (dl_rq->earliest_dl.next == 0 ||
dl_time_before(deadline, dl_rq->earliest_dl.next)) {
/*
@@ -723,7 +743,7 @@ static void dec_dl_deadline(struct dl_rq *dl_rq, u64 deadline)
if (!dl_rq->dl_nr_running) {
dl_rq->earliest_dl.curr = 0;
dl_rq->earliest_dl.next = 0;
- cpudl_set(&rq->rd->cpudl, rq->cpu, 0, 0);
+ cpudl_set(&rq->rd->cpudl, rq->cpu, 0, CPUDL_SET_FREEDL);
} else {
struct rb_node *leftmost = dl_rq->rb_leftmost;
struct sched_dl_entity *entry;
@@ -731,7 +751,8 @@ static void dec_dl_deadline(struct dl_rq *dl_rq, u64 deadline)
entry = rb_entry(leftmost, struct sched_dl_entity, rb_node);
dl_rq->earliest_dl.curr = entry->deadline;
dl_rq->earliest_dl.next = next_deadline(rq);
- cpudl_set(&rq->rd->cpudl, rq->cpu, entry->deadline, 1);
+ cpudl_set(&rq->rd->cpudl, rq->cpu,
+ entry->deadline, CPUDL_SET_DL);
}
}
@@ -1563,7 +1584,8 @@ static void rq_online_dl(struct rq *rq)
dl_set_overload(rq);
if (rq->dl.dl_nr_running > 0)
- cpudl_set(&rq->rd->cpudl, rq->cpu, rq->dl.earliest_dl.curr, 1);
+ cpudl_set(&rq->rd->cpudl, rq->cpu,
+ rq->dl.earliest_dl.curr, CPUDL_SET_DL);
}
/* Assumes rq->lock is held */
@@ -1572,7 +1594,7 @@ static void rq_offline_dl(struct rq *rq)
if (rq->dl.overloaded)
dl_clear_overload(rq);
- cpudl_set(&rq->rd->cpudl, rq->cpu, 0, 0);
+ cpudl_set(&rq->rd->cpudl, rq->cpu, 0, CPUDL_SET_FREEDL);
}
void init_sched_dl_class(void)
diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
index e053347..7838e56 100644
--- a/kernel/sched/idle_task.c
+++ b/kernel/sched/idle_task.c
@@ -26,6 +26,7 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl
static struct task_struct *
pick_next_task_idle(struct rq *rq, struct task_struct *prev)
{
+ idle_enter_dl(rq);
idle_enter_rt(rq);
put_prev_task(rq, prev);
@@ -49,6 +50,7 @@ dequeue_task_idle(struct rq *rq, struct task_struct *p, int flags)
static void put_prev_task_idle(struct rq *rq, struct task_struct *prev)
{
+ idle_exit_dl(rq);
idle_exit_rt(rq);
idle_exit_fair(rq);
rq_last_tick_reset(rq);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index dc1f7f0..a5bcded 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1488,6 +1488,9 @@ pick_next_task_rt(struct rq *rq, struct task_struct *prev)
if (!rt_rq->rt_queued)
return NULL;
+ if (prev->sched_class != &rt_sched_class)
+ rt_enter_dl(rq);
+
put_prev_task(rq, prev);
p = _pick_next_task_rt(rq);
@@ -1502,6 +1505,10 @@ pick_next_task_rt(struct rq *rq, struct task_struct *prev)
static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
{
+ /* Neglect stop preempt. As for dl preempt, doesn't matter */
+ if (rq->curr->sched_class != &rt_sched_class)
+ rt_exit_dl(rq);
+
update_curr_rt(rq);
/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index cc603fa..b76dfef 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1162,6 +1162,12 @@ extern void update_group_capacity(struct sched_domain *sd, int cpu);
extern void trigger_load_balance(struct rq *rq);
+extern void rt_enter_dl(struct rq *this_rq);
+extern void rt_exit_dl(struct rq *this_rq);
+
+extern void idle_enter_dl(struct rq *this_rq);
+extern void idle_exit_dl(struct rq *this_rq);
+
extern void idle_enter_rt(struct rq *this_rq);
extern void idle_exit_rt(struct rq *this_rq);
@@ -1169,6 +1175,11 @@ extern void idle_enter_fair(struct rq *this_rq);
extern void idle_exit_fair(struct rq *this_rq);
#else
+static inline void rt_enter_dl(struct rq *rq) { }
+static inline void rt_exit_dl(struct rq *rq) { }
+
+static inline void idle_enter_dl(struct rq *rq) { }
+static inline void idle_exit_dl(struct rq *rq) { }
static inline void idle_enter_rt(struct rq *rq) { }
static inline void idle_exit_rt(struct rq *rq) { }
--
1.7.9.5
Actually, cpupri_set() and cpupri_init() can never be used without
CONFIG_SMP.
Signed-off-by: pang.xunlei <[email protected]>
---
kernel/sched/cpupri.h | 3 ---
1 file changed, 3 deletions(-)
diff --git a/kernel/sched/cpupri.h b/kernel/sched/cpupri.h
index 6b03334..63cbb9c 100644
--- a/kernel/sched/cpupri.h
+++ b/kernel/sched/cpupri.h
@@ -26,9 +26,6 @@ int cpupri_find(struct cpupri *cp,
void cpupri_set(struct cpupri *cp, int cpu, int pri);
int cpupri_init(struct cpupri *cp);
void cpupri_cleanup(struct cpupri *cp);
-#else
-#define cpupri_set(cp, cpu, pri) do { } while (0)
-#define cpupri_init() do { } while (0)
#endif
#endif /* _LINUX_CPUPRI_H */
--
1.7.9.5
When selecting the cpu for a waking DL task, if curr is a non-DL
task which is bound only on this cpu, then we can give it a chance
to select a different cpu for this DL task to avoid curr starving.
Signed-off-by: pang.xunlei <[email protected]>
---
kernel/sched/deadline.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 7b0b2d2..1f64d4a 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -954,6 +954,9 @@ select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags)
struct task_struct *curr;
struct rq *rq;
+ if (p->nr_cpus_allowed == 1)
+ goto out;
+
if (sd_flag != SD_BALANCE_WAKE && sd_flag != SD_BALANCE_FORK)
goto out;
@@ -970,11 +973,14 @@ select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags)
* can!) we prefer to send it somewhere else. On the
* other hand, if it has a shorter deadline, we
* try to make it stay here, it might be important.
+ *
+ * If the current task on @p's runqueue is a non-DL task,
+ * and this task is bound on current runqueue, then try to
+ * see if we can wake this DL task up on a different runqueue,
*/
- if (unlikely(dl_task(curr)) &&
- (curr->nr_cpus_allowed < 2 ||
- !dl_entity_preempt(&p->dl, &curr->dl)) &&
- (p->nr_cpus_allowed > 1)) {
+ if (unlikely(curr->nr_cpus_allowed < 2) ||
+ unlikely(dl_task(curr) &&
+ !dl_entity_preempt(&p->dl, &curr->dl))) {
int target = find_later_rq(p);
if (target != -1)
--
1.7.9.5
Actually, cpudl_set() and cpudl_init() can never be used without
CONFIG_SMP.
Signed-off-by: pang.xunlei <[email protected]>
---
kernel/sched/cpudeadline.h | 3 ---
1 file changed, 3 deletions(-)
diff --git a/kernel/sched/cpudeadline.h b/kernel/sched/cpudeadline.h
index d79e4d8..7096e5a 100644
--- a/kernel/sched/cpudeadline.h
+++ b/kernel/sched/cpudeadline.h
@@ -34,9 +34,6 @@ int cpudl_find(struct cpudl *cp, struct task_struct *p,
void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int set_flags);
int cpudl_init(struct cpudl *cp);
void cpudl_cleanup(struct cpudl *cp);
-#else
-#define cpudl_set(cp, cpu, dl) do { } while (0)
-#define cpudl_init() do { } while (0)
#endif /* CONFIG_SMP */
#endif /* _LINUX_CPUDL_H */
--
1.7.9.5
On 14/11/4 下午7:13, pang.xunlei wrote:
> When selecting the cpu for a waking DL task, if curr is a non-DL
> task which is bound only on this cpu, then we can give it a chance
> to select a different cpu for this DL task to avoid curr starving.
>
> Signed-off-by: pang.xunlei <[email protected]>
> ---
> kernel/sched/deadline.c | 14 ++++++++++----
> 1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 7b0b2d2..1f64d4a 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -954,6 +954,9 @@ select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags)
> struct task_struct *curr;
> struct rq *rq;
>
> + if (p->nr_cpus_allowed == 1)
> + goto out;
> +
> if (sd_flag != SD_BALANCE_WAKE && sd_flag != SD_BALANCE_FORK)
> goto out;
I don't think you use right branch of tip tree.
Regards,
Wanpeng Li
>
> @@ -970,11 +973,14 @@ select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags)
> * can!) we prefer to send it somewhere else. On the
> * other hand, if it has a shorter deadline, we
> * try to make it stay here, it might be important.
> + *
> + * If the current task on @p's runqueue is a non-DL task,
> + * and this task is bound on current runqueue, then try to
> + * see if we can wake this DL task up on a different runqueue,
> */
> - if (unlikely(dl_task(curr)) &&
> - (curr->nr_cpus_allowed < 2 ||
> - !dl_entity_preempt(&p->dl, &curr->dl)) &&
> - (p->nr_cpus_allowed > 1)) {
> + if (unlikely(curr->nr_cpus_allowed < 2) ||
> + unlikely(dl_task(curr) &&
> + !dl_entity_preempt(&p->dl, &curr->dl))) {
> int target = find_later_rq(p);
>
> if (target != -1)
>
> When selecting the cpu for a waking RT task, if curr is a non-RT
> task which is bound only on this cpu, then we can give it a chance
> to select a different cpu(definitely an idle cpu if existing) for
> the RT task to avoid curr starving.
>
> Signed-off-by: pang.xunlei <[email protected]>
> ---
> kernel/sched/rt.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index da6922e..dc1f7f0 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1340,6 +1340,11 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
> * runqueue. Otherwise simply start this RT task
> * on its current runqueue.
> *
> + * If the current task on @p's runqueue is a non-RT task,
> + * and this task is bound on current runqueue, then try to
> + * see if we can wake this RT task up on a different runqueue,
> + * we will definitely find an idle cpu if there is any.
> + *
> * We want to avoid overloading runqueues. If the woken
> * task is a higher priority, then it will stay on this CPU
> * and the lower prio task should be moved to another CPU.
> @@ -1356,9 +1361,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
> * This test is optimistic, if we get it wrong the load-balancer
> * will have to sort it out.
> */
> - if (curr && unlikely(rt_task(curr)) &&
> - (curr->nr_cpus_allowed < 2 ||
> - curr->prio <= p->prio)) {
> + if (curr && unlikely(curr->nr_cpus_allowed < 2 ||
> + curr->prio <= p->prio)) {
Nack, it is no meaning to compare apple against orange.
Hillf
> int target = find_lowest_rq(p);
>
> if (target != -1)
> --
> 1.7.9.5
On Tue, 4 Nov 2014 19:13:01 +0800
"pang.xunlei" <[email protected]> wrote:
> When selecting the cpu for a waking RT task, if curr is a non-RT
> task which is bound only on this cpu, then we can give it a chance
> to select a different cpu(definitely an idle cpu if existing) for
> the RT task to avoid curr starving.
Absolutely not! An RT task doesn't give a crap if a non RT task is
bound to a CPU or not. We are not going to migrate an RT task to be
nice to a bounded non-RT task.
Migration is not cheap. It causes cache misses and TLB flushes. This is
not something that should be taken lightly.
Nack
-- Steve
On 4 November 2014 19:24, Wanpeng Li <[email protected]> wrote:
>
> On 14/11/4 下午7:13, pang.xunlei wrote:
>>
>> When selecting the cpu for a waking DL task, if curr is a non-DL
>> task which is bound only on this cpu, then we can give it a chance
>> to select a different cpu for this DL task to avoid curr starving.
>>
>> Signed-off-by: pang.xunlei <[email protected]>
>> ---
>> kernel/sched/deadline.c | 14 ++++++++++----
>> 1 file changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>> index 7b0b2d2..1f64d4a 100644
>> --- a/kernel/sched/deadline.c
>> +++ b/kernel/sched/deadline.c
>> @@ -954,6 +954,9 @@ select_task_rq_dl(struct task_struct *p, int cpu, int
>> sd_flag, int flags)
>> struct task_struct *curr;
>> struct rq *rq;
>> + if (p->nr_cpus_allowed == 1)
>> + goto out;
>> +
>> if (sd_flag != SD_BALANCE_WAKE && sd_flag != SD_BALANCE_FORK)
>> goto out;
>
>
> I don't think you use right branch of tip tree.
Hi Wanpeng, I'm using linux-3.18-rc3 as the base, does this have
something wrong? please point me out if any.
Thanks!
>
> Regards,
> Wanpeng Li
>
>> @@ -970,11 +973,14 @@ select_task_rq_dl(struct task_struct *p, int cpu,
>> int sd_flag, int flags)
>> * can!) we prefer to send it somewhere else. On the
>> * other hand, if it has a shorter deadline, we
>> * try to make it stay here, it might be important.
>> + *
>> + * If the current task on @p's runqueue is a non-DL task,
>> + * and this task is bound on current runqueue, then try to
>> + * see if we can wake this DL task up on a different runqueue,
>> */
>> - if (unlikely(dl_task(curr)) &&
>> - (curr->nr_cpus_allowed < 2 ||
>> - !dl_entity_preempt(&p->dl, &curr->dl)) &&
>> - (p->nr_cpus_allowed > 1)) {
>> + if (unlikely(curr->nr_cpus_allowed < 2) ||
>> + unlikely(dl_task(curr) &&
>> + !dl_entity_preempt(&p->dl, &curr->dl))) {
>> int target = find_later_rq(p);
>> if (target != -1)
>
>
On 4 November 2014 20:52, Steven Rostedt <[email protected]> wrote:
> On Tue, 4 Nov 2014 19:13:01 +0800
> "pang.xunlei" <[email protected]> wrote:
>
>> When selecting the cpu for a waking RT task, if curr is a non-RT
>> task which is bound only on this cpu, then we can give it a chance
>> to select a different cpu(definitely an idle cpu if existing) for
>> the RT task to avoid curr starving.
>
> Absolutely not! An RT task doesn't give a crap if a non RT task is
> bound to a CPU or not. We are not going to migrate an RT task to be
> nice to a bounded non-RT task.
>
> Migration is not cheap. It causes cache misses and TLB flushes. This is
> not something that should be taken lightly.
Ok, thanks!
But I think the PUSH operation optimized by the former patch is reasonable,
since PUSH itselft does involve the Migration. Do I miss something?
>
> Nack
>
> -- Steve
>
On Tue, 4 Nov 2014 19:13:02 +0800
"pang.xunlei" <[email protected]> wrote:
> Actually, cpupri_set() and cpupri_init() can never be used without
> CONFIG_SMP.
>
Acked-by: Steven Rostedt <[email protected]>
-- Steve
> Signed-off-by: pang.xunlei <[email protected]>
> ---
> kernel/sched/cpupri.h | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/kernel/sched/cpupri.h b/kernel/sched/cpupri.h
> index 6b03334..63cbb9c 100644
> --- a/kernel/sched/cpupri.h
> +++ b/kernel/sched/cpupri.h
> @@ -26,9 +26,6 @@ int cpupri_find(struct cpupri *cp,
> void cpupri_set(struct cpupri *cp, int cpu, int pri);
> int cpupri_init(struct cpupri *cp);
> void cpupri_cleanup(struct cpupri *cp);
> -#else
> -#define cpupri_set(cp, cpu, pri) do { } while (0)
> -#define cpupri_init() do { } while (0)
> #endif
>
> #endif /* _LINUX_CPUPRI_H */
On Tue, 4 Nov 2014 19:13:04 +0800
"pang.xunlei" <[email protected]> wrote:
> When selecting the cpu for a waking DL task, if curr is a non-DL
> task which is bound only on this cpu, then we can give it a chance
> to select a different cpu for this DL task to avoid curr starving.
>
> Signed-off-by: pang.xunlei <[email protected]>
> ---
> kernel/sched/deadline.c | 14 ++++++++++----
> 1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 7b0b2d2..1f64d4a 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -954,6 +954,9 @@ select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags)
> struct task_struct *curr;
> struct rq *rq;
>
> + if (p->nr_cpus_allowed == 1)
> + goto out;
> +
This looks fine, and I'm wondering if we shouldn't just move this into
kernel/sched/core.c: select_task_rq(). Why bother calling the select_rq
code if the task is pinned?
This change will make fair.c, rt.c, and deadline.c all start with the
same logic. If this should be an optimization, just move it to core.c
and be done with it.
> if (sd_flag != SD_BALANCE_WAKE && sd_flag != SD_BALANCE_FORK)
> goto out;
>
> @@ -970,11 +973,14 @@ select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags)
> * can!) we prefer to send it somewhere else. On the
> * other hand, if it has a shorter deadline, we
> * try to make it stay here, it might be important.
> + *
> + * If the current task on @p's runqueue is a non-DL task,
> + * and this task is bound on current runqueue, then try to
> + * see if we can wake this DL task up on a different runqueue,
> */
> - if (unlikely(dl_task(curr)) &&
> - (curr->nr_cpus_allowed < 2 ||
> - !dl_entity_preempt(&p->dl, &curr->dl)) &&
> - (p->nr_cpus_allowed > 1)) {
> + if (unlikely(curr->nr_cpus_allowed < 2) ||
> + unlikely(dl_task(curr) &&
> + !dl_entity_preempt(&p->dl, &curr->dl))) {
This has the same issue as the rt.c change.
-- Steve
> int target = find_later_rq(p);
>
> if (target != -1)
On Tue, 4 Nov 2014 22:29:24 +0800
"pang.xunlei" <[email protected]> wrote:
> > Migration is not cheap. It causes cache misses and TLB flushes. This is
> > not something that should be taken lightly.
> Ok, thanks!
> But I think the PUSH operation optimized by the former patch is reasonable,
> since PUSH itselft does involve the Migration. Do I miss something?
For the first patch you may be right, but I want to think about it some
more. I want to make sure we are not adding any other type of overhead
with the extra calls.
-- Steve
On 4 November 2014 22:47, Steven Rostedt <[email protected]> wrote:
> On Tue, 4 Nov 2014 22:29:24 +0800
> "pang.xunlei" <[email protected]> wrote:
>
>
>> > Migration is not cheap. It causes cache misses and TLB flushes. This is
>> > not something that should be taken lightly.
>> Ok, thanks!
>> But I think the PUSH operation optimized by the former patch is reasonable,
>> since PUSH itselft does involve the Migration. Do I miss something?
>
> For the first patch you may be right, but I want to think about it some
> more. I want to make sure we are not adding any other type of overhead
> with the extra calls.
Yes, this may cause some overhead/latency in idle especially its exit
stage, if that can't be accepted, I think it can also be done just in
find_lowest_rq() after cpupri_find(), we can modify cpupri_find() for
example to return a pri_to_cpu[] index plus one instead of 1, then if
the return index equals CPUPRI_NORMAL+1, then iterate the
"lowest_mask" with something like cpu_idle() judgement to select the
idle cpu.
>
> -- Steve
On Tue, Nov 04, 2014 at 09:45:49AM -0500, Steven Rostedt wrote:
> > +++ b/kernel/sched/deadline.c
> > @@ -954,6 +954,9 @@ select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags)
> > struct task_struct *curr;
> > struct rq *rq;
> >
> > + if (p->nr_cpus_allowed == 1)
> > + goto out;
> > +
>
> This looks fine, and I'm wondering if we shouldn't just move this into
> kernel/sched/core.c: select_task_rq(). Why bother calling the select_rq
> code if the task is pinned?
>
> This change will make fair.c, rt.c, and deadline.c all start with the
> same logic. If this should be an optimization, just move it to core.c
> and be done with it.
Yeah, that makes sense. Back when, in the olden days, nr_cpus_allowed
was specific to the rt class, but we fixed that.
Hi Xunlei,
On 14/11/4 下午10:19, pang.xunlei wrote:
> On 4 November 2014 19:24, Wanpeng Li <[email protected]> wrote:
>> On 14/11/4 下午7:13, pang.xunlei wrote:
>>> When selecting the cpu for a waking DL task, if curr is a non-DL
>>> task which is bound only on this cpu, then we can give it a chance
>>> to select a different cpu for this DL task to avoid curr starving.
>>>
>>> Signed-off-by: pang.xunlei <[email protected]>
>>> ---
>>> kernel/sched/deadline.c | 14 ++++++++++----
>>> 1 file changed, 10 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>>> index 7b0b2d2..1f64d4a 100644
>>> --- a/kernel/sched/deadline.c
>>> +++ b/kernel/sched/deadline.c
>>> @@ -954,6 +954,9 @@ select_task_rq_dl(struct task_struct *p, int cpu, int
>>> sd_flag, int flags)
>>> struct task_struct *curr;
>>> struct rq *rq;
>>> + if (p->nr_cpus_allowed == 1)
>>> + goto out;
>>> +
>>> if (sd_flag != SD_BALANCE_WAKE && sd_flag != SD_BALANCE_FORK)
>>> goto out;
>>
>> I don't think you use right branch of tip tree.
> Hi Wanpeng, I'm using linux-3.18-rc3 as the base, does this have
> something wrong? please point me out if any.
I have already do this, my patch merged in tip tree, you can check it.
Regards,
Wanpeng Li
> Thanks!
>> Regards,
>> Wanpeng Li
>>
>>> @@ -970,11 +973,14 @@ select_task_rq_dl(struct task_struct *p, int cpu,
>>> int sd_flag, int flags)
>>> * can!) we prefer to send it somewhere else. On the
>>> * other hand, if it has a shorter deadline, we
>>> * try to make it stay here, it might be important.
>>> + *
>>> + * If the current task on @p's runqueue is a non-DL task,
>>> + * and this task is bound on current runqueue, then try to
>>> + * see if we can wake this DL task up on a different runqueue,
>>> */
>>> - if (unlikely(dl_task(curr)) &&
>>> - (curr->nr_cpus_allowed < 2 ||
>>> - !dl_entity_preempt(&p->dl, &curr->dl)) &&
>>> - (p->nr_cpus_allowed > 1)) {
>>> + if (unlikely(curr->nr_cpus_allowed < 2) ||
>>> + unlikely(dl_task(curr) &&
>>> + !dl_entity_preempt(&p->dl, &curr->dl))) {
>>> int target = find_later_rq(p);
>>> if (target != -1)
>>
Hi Steven,
On 14/11/4 下午10:45, Steven Rostedt wrote:
> On Tue, 4 Nov 2014 19:13:04 +0800
> "pang.xunlei" <[email protected]> wrote:
>
>> When selecting the cpu for a waking DL task, if curr is a non-DL
>> task which is bound only on this cpu, then we can give it a chance
>> to select a different cpu for this DL task to avoid curr starving.
>>
>> Signed-off-by: pang.xunlei <[email protected]>
>> ---
>> kernel/sched/deadline.c | 14 ++++++++++----
>> 1 file changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>> index 7b0b2d2..1f64d4a 100644
>> --- a/kernel/sched/deadline.c
>> +++ b/kernel/sched/deadline.c
>> @@ -954,6 +954,9 @@ select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags)
>> struct task_struct *curr;
>> struct rq *rq;
>>
>> + if (p->nr_cpus_allowed == 1)
>> + goto out;
>> +
> This looks fine, and I'm wondering if we shouldn't just move this into
> kernel/sched/core.c: select_task_rq(). Why bother calling the select_rq
> code if the task is pinned?
>
> This change will make fair.c, rt.c, and deadline.c all start with the
> same logic. If this should be an optimization, just move it to core.c
> and be done with it.
Actually I have already do this for dl class and patch merged in tip
tree currently, maybe pang.xunlei miss it. I will send a patch to move
them all to core.c soon.
Regards,
Wanpeng Li
>
>
>> if (sd_flag != SD_BALANCE_WAKE && sd_flag != SD_BALANCE_FORK)
>> goto out;
>>
>> @@ -970,11 +973,14 @@ select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags)
>> * can!) we prefer to send it somewhere else. On the
>> * other hand, if it has a shorter deadline, we
>> * try to make it stay here, it might be important.
>> + *
>> + * If the current task on @p's runqueue is a non-DL task,
>> + * and this task is bound on current runqueue, then try to
>> + * see if we can wake this DL task up on a different runqueue,
>> */
>> - if (unlikely(dl_task(curr)) &&
>> - (curr->nr_cpus_allowed < 2 ||
>> - !dl_entity_preempt(&p->dl, &curr->dl)) &&
>> - (p->nr_cpus_allowed > 1)) {
>> + if (unlikely(curr->nr_cpus_allowed < 2) ||
>> + unlikely(dl_task(curr) &&
>> + !dl_entity_preempt(&p->dl, &curr->dl))) {
> This has the same issue as the rt.c change.
>
> -- Steve
>
>> int target = find_later_rq(p);
>>
>> if (target != -1)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/