From: Xunlei Pang <[email protected]>
We may suffer from extra rt overload rq due to the affinity,
so when the affinity of any runnable rt task is changed, we
should check to trigger balancing, otherwise it will cause
some unnecessary delayed real-time response. Unfortunately,
current RT global scheduler doesn't trigger anything.
For example: a 2-cpu system with two runnable FIFO tasks(same
rt_priority) bound on CPU0, let's name them rt1(running) and
rt2(runnable) respectively; CPU1 has no RTs. Then, someone sets
the affinity of rt2 to 0x3(i.e. CPU0 and CPU1), but after this,
rt2 still can't be scheduled until rt1 enters schedule(), this
definitely causes some/big response latency for rt2.
So, when doing set_cpus_allowed_rt(), if detecting such cases,
check to trigger a push behaviour.
Signed-off-by: Xunlei Pang <[email protected]>
---
v2:
Refine according to Steven Rostedt's comments.
kernel/sched/rt.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 70 insertions(+), 10 deletions(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index f4d4b07..b1ea9c0 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1428,10 +1428,9 @@ static struct sched_rt_entity *pick_next_rt_entity(struct rq *rq,
return next;
}
-static struct task_struct *_pick_next_task_rt(struct rq *rq)
+static struct task_struct *peek_next_task_rt(struct rq *rq)
{
struct sched_rt_entity *rt_se;
- struct task_struct *p;
struct rt_rq *rt_rq = &rq->rt;
do {
@@ -1440,7 +1439,14 @@ static struct task_struct *_pick_next_task_rt(struct rq *rq)
rt_rq = group_rt_rq(rt_se);
} while (rt_rq);
- p = rt_task_of(rt_se);
+ return rt_task_of(rt_se);
+}
+
+static inline struct task_struct *_pick_next_task_rt(struct rq *rq)
+{
+ struct task_struct *p;
+
+ p = peek_next_task_rt(rq);
p->se.exec_start = rq_clock_task(rq);
return p;
@@ -1886,28 +1892,73 @@ static void set_cpus_allowed_rt(struct task_struct *p,
const struct cpumask *new_mask)
{
struct rq *rq;
- int weight;
+ int old_weight, new_weight;
+ int preempt_push = 0, direct_push = 0;
BUG_ON(!rt_task(p));
if (!task_on_rq_queued(p))
return;
- weight = cpumask_weight(new_mask);
+ old_weight = p->nr_cpus_allowed;
+ new_weight = cpumask_weight(new_mask);
+
+ rq = task_rq(p);
+
+ if (new_weight > 1 &&
+ rt_task(rq->curr) &&
+ !test_tsk_need_resched(rq->curr)) {
+ /*
+ * Set new mask information which is already valid
+ * to prepare pushing.
+ *
+ * We own p->pi_lock and rq->lock. rq->lock might
+ * get released when doing direct pushing, however
+ * p->pi_lock is always held, so it's safe to assign
+ * the new_mask and new_weight to p.
+ */
+ cpumask_copy(&p->cpus_allowed, new_mask);
+ p->nr_cpus_allowed = new_weight;
+
+ if (task_running(rq, p) &&
+ cpumask_test_cpu(task_cpu(p), new_mask) &&
+ cpupri_find(&rq->rd->cpupri, p, NULL)) {
+ /*
+ * At this point, current task gets migratable most
+ * likely due to the change of its affinity, let's
+ * figure out if we can migrate it.
+ *
+ * Is there any task with the same priority as that
+ * of current task? If found one, we should resched.
+ * NOTE: The target may be unpushable.
+ */
+ if (p->prio == rq->rt.highest_prio.next) {
+ /* One target just in pushable_tasks list. */
+ requeue_task_rt(rq, p, 0);
+ preempt_push = 1;
+ } else if (rq->rt.rt_nr_total > 1) {
+ struct task_struct *next;
+
+ requeue_task_rt(rq, p, 0);
+ next = peek_next_task_rt(rq);
+ if (next != p && next->prio == p->prio)
+ preempt_push = 1;
+ }
+ } else if (!task_running(rq, p))
+ direct_push = 1;
+ }
/*
* Only update if the process changes its state from whether it
* can migrate or not.
*/
- if ((p->nr_cpus_allowed > 1) == (weight > 1))
- return;
-
- rq = task_rq(p);
+ if ((old_weight > 1) == (new_weight > 1))
+ goto out;
/*
* The process used to be able to migrate OR it can now migrate
*/
- if (weight <= 1) {
+ if (new_weight <= 1) {
if (!task_current(rq, p))
dequeue_pushable_task(rq, p);
BUG_ON(!rq->rt.rt_nr_migratory);
@@ -1919,6 +1970,15 @@ static void set_cpus_allowed_rt(struct task_struct *p,
}
update_rt_migration(&rq->rt);
+
+out:
+ BUG_ON(direct_push == 1 && preempt_push == 1);
+
+ if (direct_push)
+ push_rt_tasks(rq);
+
+ if (preempt_push)
+ resched_curr(rq);
}
/* Assumes rq->lock is held */
--
1.9.1
From: Xunlei Pang <[email protected]>
check_preempt_curr() doesn't call sched_class::check_preempt_curr
when the class of current is a higher level. So if there is a DL
task running when doing this for RT, check_preempt_equal_prio()
will definitely miss, which may result in some response latency
for this RT task if it is pinned and there're some same-priority
migratable rt tasks already queued.
We should do the similar thing in select_task_rq_rt() when first
picking rt tasks after running out of DL tasks.
This patch tackles the issue by peeking the next rt task(RT1), and
if find RT1 migratable, just requeue it to the tail of the rq using
requeue_task_rt(rq, p, 0). In this way:
- If there do have another rt task(RT2) with the same priority as
RT1, RT2 will finally be picked as the running task. While RT1
will be pushed onto another cpu via RT1's post_schedule(), as
RT1 is migratable. The difference from check_preempt_equal_prio()
here is that we just don't care whether RT2 is migratable.
- Otherwise, if there's no rt task with the same priority as RT1,
RT1 will still be picked as the running task after the requeuing.
Signed-off-by: Xunlei Pang <[email protected]>
---
kernel/sched/rt.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index b1ea9c0..17dcbfa 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1482,6 +1482,22 @@ pick_next_task_rt(struct rq *rq, struct task_struct *prev)
put_prev_task(rq, prev);
+#ifdef CONFIG_SMP
+ /*
+ * If there's a running higher class task, check_preempt_curr()
+ * doesn't invoke check_preempt_equal_prio() for rt tasks, so
+ * we can do the similar thing here.
+ */
+ if (rq->rt.rt_nr_total > 1 &&
+ (prev->sched_class == &dl_sched_class ||
+ prev->sched_class == &stop_sched_class)) {
+ p = peek_next_task_rt(rq);
+ if (p->nr_cpus_allowed != 1 &&
+ cpupri_find(&rq->rd->cpupri, p, NULL))
+ requeue_task_rt(rq, p, 0);
+ }
+#endif
+
p = _pick_next_task_rt(rq);
/* The running task is never eligible for pushing */
--
1.9.1
On Thu, 5 Feb 2015 23:59:33 +0800
Xunlei Pang <[email protected]> wrote:
return p;
> @@ -1886,28 +1892,73 @@ static void set_cpus_allowed_rt(struct task_struct *p,
> const struct cpumask *new_mask)
> {
> struct rq *rq;
> - int weight;
> + int old_weight, new_weight;
> + int preempt_push = 0, direct_push = 0;
>
> BUG_ON(!rt_task(p));
>
> if (!task_on_rq_queued(p))
> return;
>
> - weight = cpumask_weight(new_mask);
> + old_weight = p->nr_cpus_allowed;
> + new_weight = cpumask_weight(new_mask);
> +
> + rq = task_rq(p);
> +
> + if (new_weight > 1 &&
> + rt_task(rq->curr) &&
> + !test_tsk_need_resched(rq->curr)) {
> + /*
> + * Set new mask information which is already valid
> + * to prepare pushing.
> + *
> + * We own p->pi_lock and rq->lock. rq->lock might
> + * get released when doing direct pushing, however
> + * p->pi_lock is always held, so it's safe to assign
> + * the new_mask and new_weight to p.
> + */
> + cpumask_copy(&p->cpus_allowed, new_mask);
> + p->nr_cpus_allowed = new_weight;
> +
> + if (task_running(rq, p) &&
> + cpumask_test_cpu(task_cpu(p), new_mask) &&
Why the check for task_cpu being in new_mask?
> + cpupri_find(&rq->rd->cpupri, p, NULL)) {
> + /*
> + * At this point, current task gets migratable most
> + * likely due to the change of its affinity, let's
> + * figure out if we can migrate it.
> + *
> + * Is there any task with the same priority as that
> + * of current task? If found one, we should resched.
> + * NOTE: The target may be unpushable.
> + */
> + if (p->prio == rq->rt.highest_prio.next) {
> + /* One target just in pushable_tasks list. */
> + requeue_task_rt(rq, p, 0);
> + preempt_push = 1;
> + } else if (rq->rt.rt_nr_total > 1) {
> + struct task_struct *next;
> +
> + requeue_task_rt(rq, p, 0);
> + next = peek_next_task_rt(rq);
> + if (next != p && next->prio == p->prio)
> + preempt_push = 1;
> + }
> + } else if (!task_running(rq, p))
> + direct_push = 1;
We could avoid the second check (!task_running()) by splitting up the
first if:
if (task_running(rq, p)) {
if (cpumask_test_cpu() && cpupri_find()) {
}
} else {
direct push = 1
Also, is the copy of cpus_allowed only done so that cpupri_find is
called? If so maybe move it in there too:
if (task_running(rq, p)) {
if (!cpumask_test_cpu())
goto update;
cpumask_copy(&p->cpus_allowed, new_mask);
p->nr_cpus_allowed = new_weight;
if (!cpupri_find())
goto update;
[...]
This way we avoid the double copy of cpumask unless we truly need to do
it.
> + }
>
> /*
> * Only update if the process changes its state from whether it
> * can migrate or not.
> */
> - if ((p->nr_cpus_allowed > 1) == (weight > 1))
> - return;
> -
> - rq = task_rq(p);
> + if ((old_weight > 1) == (new_weight > 1))
> + goto out;
>
> /*
> * The process used to be able to migrate OR it can now migrate
> */
> - if (weight <= 1) {
> + if (new_weight <= 1) {
> if (!task_current(rq, p))
> dequeue_pushable_task(rq, p);
> BUG_ON(!rq->rt.rt_nr_migratory);
> @@ -1919,6 +1970,15 @@ static void set_cpus_allowed_rt(struct task_struct *p,
> }
>
> update_rt_migration(&rq->rt);
> +
> +out:
> + BUG_ON(direct_push == 1 && preempt_push == 1);
Do we really need this bug on?
> +
> + if (direct_push)
> + push_rt_tasks(rq);
> +
> + if (preempt_push)
We could make that an "else if" if they really are mutually exclusive.
-- Steve
> + resched_curr(rq);
> }
>
> /* Assumes rq->lock is held */
Hi Steve,
On 7 February 2015 at 05:09, Steven Rostedt <[email protected]> wrote:
> On Thu, 5 Feb 2015 23:59:33 +0800
>> +
>> + if (task_running(rq, p) &&
>> + cpumask_test_cpu(task_cpu(p), new_mask) &&
>
> Why the check for task_cpu being in new_mask?
If the current cpu of this task is not in the new_mask,
it will get migrated by set_cpus_allowed_ptr(), so we
don't need to resched.
>
>> + cpupri_find(&rq->rd->cpupri, p, NULL)) {
>> + /*
>> + * At this point, current task gets migratable most
>> + * likely due to the change of its affinity, let's
>> + * figure out if we can migrate it.
>> + *
>> + * Is there any task with the same priority as that
>> + * of current task? If found one, we should resched.
>> + * NOTE: The target may be unpushable.
>> + */
>> + if (p->prio == rq->rt.highest_prio.next) {
>> + /* One target just in pushable_tasks list. */
>> + requeue_task_rt(rq, p, 0);
>> + preempt_push = 1;
>> + } else if (rq->rt.rt_nr_total > 1) {
>> + struct task_struct *next;
>> +
>> + requeue_task_rt(rq, p, 0);
>> + next = peek_next_task_rt(rq);
>> + if (next != p && next->prio == p->prio)
>> + preempt_push = 1;
>> + }
>> + } else if (!task_running(rq, p))
>> + direct_push = 1;
>
> We could avoid the second check (!task_running()) by splitting up the
> first if:
ok, I'll adjust it.
>
> if (task_running(rq, p)) {
> if (cpumask_test_cpu() && cpupri_find()) {
> }
> } else {
> direct push = 1
>
> Also, is the copy of cpus_allowed only done so that cpupri_find is
> called? If so maybe move it in there too:
>
> if (task_running(rq, p)) {
> if (!cpumask_test_cpu())
> goto update;
>
> cpumask_copy(&p->cpus_allowed, new_mask);
> p->nr_cpus_allowed = new_weight;
>
> if (!cpupri_find())
> goto update;
>
> [...]
>
> This way we avoid the double copy of cpumask unless we truly need to do
> it.
The new_mask can also be used by direct_push case, so I think it's ok.
>
>> + }
>>
>> /*
>> * Only update if the process changes its state from whether it
>> * can migrate or not.
>> */
>> - if ((p->nr_cpus_allowed > 1) == (weight > 1))
>> - return;
>> -
>> - rq = task_rq(p);
>> + if ((old_weight > 1) == (new_weight > 1))
>> + goto out;
>>
>> /*
>> * The process used to be able to migrate OR it can now migrate
>> */
>> - if (weight <= 1) {
>> + if (new_weight <= 1) {
>> if (!task_current(rq, p))
>> dequeue_pushable_task(rq, p);
>> BUG_ON(!rq->rt.rt_nr_migratory);
>> @@ -1919,6 +1970,15 @@ static void set_cpus_allowed_rt(struct task_struct *p,
>> }
>>
>> update_rt_migration(&rq->rt);
>> +
>> +out:
>> + BUG_ON(direct_push == 1 && preempt_push == 1);
>
> Do we really need this bug on?
>
>> +
>> + if (direct_push)
>> + push_rt_tasks(rq);
>> +
>> + if (preempt_push)
>
> We could make that an "else if" if they really are mutually exclusive.
>
I'll fix those things, and resend another version.
Thanks,
Xunlei
On 8 February 2015 at 22:55, Xunlei Pang <[email protected]> wrote:
> Hi Steve,
>
> On 7 February 2015 at 05:09, Steven Rostedt <[email protected]> wrote:
>> On Thu, 5 Feb 2015 23:59:33 +0800
>>
>> if (task_running(rq, p)) {
>> if (cpumask_test_cpu() && cpupri_find()) {
>> }
>> } else {
>> direct push = 1
>>
>> Also, is the copy of cpus_allowed only done so that cpupri_find is
>> called? If so maybe move it in there too:
>>
>> if (task_running(rq, p)) {
>> if (!cpumask_test_cpu())
>> goto update;
>>
>> cpumask_copy(&p->cpus_allowed, new_mask);
>> p->nr_cpus_allowed = new_weight;
>>
>> if (!cpupri_find())
>> goto update;
>>
>> [...]
>>
>> This way we avoid the double copy of cpumask unless we truly need to do
>> it.
>
> The new_mask can also be used by direct_push case, so I think it's ok.
I guess you mean to avoid the copy if cpumask_test_cpu() is false.
I think this function is not the hot path, making this will make the code
indents too many times or not so good to look, a little awful.
Thanks,
Xunlei