2015-04-26 17:12:57

by Xunlei Pang

[permalink] [raw]
Subject: [RFC PATCH 1/6] sched/rt: Provide new check_preempt_equal_prio_common()

From: Xunlei Pang <[email protected]>

When p is queued, there may be other tasks already queued at the
same priority in the "run queue", so we should peek the most front
one to do the equal priority preemption.

This patch modifies check_preempt_equal_prio() and provides new
check_preempt_equal_prio_common() to do the common preemption.

There are also other cases to be added calling the new interface
in the following patches.

Signed-off-by: Xunlei Pang <[email protected]>
---
kernel/sched/rt.c | 70 ++++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 54 insertions(+), 16 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 575da76..6b40555 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1366,33 +1366,66 @@ out:
return cpu;
}

-static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
+static struct task_struct *peek_next_task_rt(struct rq *rq);
+
+static void check_preempt_equal_prio_common(struct rq *rq)
{
+ struct task_struct *curr = rq->curr;
+ struct task_struct *next;
+
+ /* Current can't be migrated, useless to reschedule */
+ if (curr->nr_cpus_allowed == 1 ||
+ !cpupri_find(&rq->rd->cpupri, curr, NULL))
+ return;
+
/*
- * Current can't be migrated, useless to reschedule,
- * let's hope p can move out.
+ * Can we find any task with the same priority as
+ * curr? To accomplish this, firstly requeue curr
+ * to the tail, then peek next, finally put curr
+ * back to the head if a different task was peeked.
*/
- if (rq->curr->nr_cpus_allowed == 1 ||
- !cpupri_find(&rq->rd->cpupri, rq->curr, NULL))
+ requeue_task_rt(rq, curr, 0);
+ next = peek_next_task_rt(rq);
+ if (next == curr)
+ return;
+
+ requeue_task_rt(rq, curr, 1);
+
+ if (next->prio != curr->prio)
return;

/*
- * p is migratable, so let's not schedule it and
- * see if it is pushed or pulled somewhere else.
+ * Got the right next queued with the same priority
+ * as current. If next is migratable, don't schedule
+ * it as it will be pushed or pulled somewhere else.
*/
- if (p->nr_cpus_allowed != 1
- && cpupri_find(&rq->rd->cpupri, p, NULL))
+ if (next->nr_cpus_allowed != 1 &&
+ cpupri_find(&rq->rd->cpupri, next, NULL))
return;

/*
* There appears to be other cpus that can accept
- * current and none to run 'p', so lets reschedule
- * to try and push current away:
+ * current and none to run next, so lets reschedule
+ * to try and push current away.
*/
- requeue_task_rt(rq, p, 1);
+ requeue_task_rt(rq, next, 1);
resched_curr(rq);
}

+static inline
+void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
+{
+ /*
+ * p is migratable, so let's not schedule it and
+ * see if it is pushed or pulled somewhere else.
+ */
+ if (p->nr_cpus_allowed != 1 &&
+ cpupri_find(&rq->rd->cpupri, p, NULL))
+ return;
+
+ check_preempt_equal_prio_common(rq);
+}
+
#endif /* CONFIG_SMP */

/*
@@ -1440,10 +1473,9 @@ static struct sched_rt_entity *pick_next_rt_entity(struct rq *rq,
return next;
}

-static struct task_struct *_pick_next_task_rt(struct rq *rq)
+static struct task_struct *peek_next_task_rt(struct rq *rq)
{
struct sched_rt_entity *rt_se;
- struct task_struct *p;
struct rt_rq *rt_rq = &rq->rt;

do {
@@ -1452,9 +1484,15 @@ static struct task_struct *_pick_next_task_rt(struct rq *rq)
rt_rq = group_rt_rq(rt_se);
} while (rt_rq);

- p = rt_task_of(rt_se);
- p->se.exec_start = rq_clock_task(rq);
+ return rt_task_of(rt_se);
+}

+static inline struct task_struct *_pick_next_task_rt(struct rq *rq)
+{
+ struct task_struct *p;
+
+ p = peek_next_task_rt(rq);
+ p->se.exec_start = rq_clock_task(rq);
return p;
}

--
1.9.1


2015-04-26 17:12:25

by Xunlei Pang

[permalink] [raw]
Subject: [RFC PATCH 2/6] sched/rt: Check to push task away when its affinity is changed

From: Xunlei Pang <[email protected]>

We may suffer from extra rt overload rq due to the affinity,
so when the affinity of any runnable rt task is changed, we
should check to trigger balancing, otherwise it will cause
some unnecessary delayed real-time response. Unfortunately,
current RT global scheduler does nothing about this.

For example: a 2-cpu system with two runnable FIFO tasks(same
rt_priority) bound on CPU0, let's name them rt1(running) and
rt2(runnable) respectively; CPU1 has no RTs. Then, someone sets
the affinity of rt2 to 0x3(i.e. CPU0 and CPU1), but after this,
rt2 still can't be scheduled until rt1 enters schedule(), this
definitely causes some/big response latency for rt2.

This patch introduces new sched_class::post_set_cpus_allowed()
for rt called after set_cpus_allowed_rt(). In this new function,
it triggers a push action when detecting such cases.

Signed-off-by: Xunlei Pang <[email protected]>
---
kernel/sched/core.c | 3 +++
kernel/sched/rt.c | 24 ++++++++++++++++++++++++
kernel/sched/sched.h | 1 +
3 files changed, 28 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f9123a8..dc646a6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4769,6 +4769,9 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)

cpumask_copy(&p->cpus_allowed, new_mask);
p->nr_cpus_allowed = cpumask_weight(new_mask);
+
+ if (p->sched_class->post_set_cpus_allowed)
+ p->sched_class->post_set_cpus_allowed(p);
}

/*
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 6b40555..7b76747 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2136,6 +2136,29 @@ static void set_cpus_allowed_rt(struct task_struct *p,
update_rt_migration(&rq->rt);
}

+static void post_set_cpus_allowed_rt(struct task_struct *p)
+{
+ struct rq *rq;
+
+ if (!task_on_rq_queued(p))
+ return;
+
+ rq = task_rq(p);
+ if (p->nr_cpus_allowed > 1 &&
+ rq->rt.rt_nr_running > 1 &&
+ rt_task(rq->curr) && !test_tsk_need_resched(rq->curr)) {
+ if (!task_running(rq, p)) {
+ push_rt_task(rq);
+ } else if (cpumask_test_cpu(task_cpu(p), &p->cpus_allowed)) {
+ /*
+ * p(current) may get migratable due to the change
+ * of its affinity, let's try to push it away.
+ */
+ check_preempt_equal_prio_common(rq);
+ }
+ }
+}
+
/* Assumes rq->lock is held */
static void rq_online_rt(struct rq *rq)
{
@@ -2350,6 +2373,7 @@ const struct sched_class rt_sched_class = {
.select_task_rq = select_task_rq_rt,

.set_cpus_allowed = set_cpus_allowed_rt,
+ .post_set_cpus_allowed = post_set_cpus_allowed_rt,
.rq_online = rq_online_rt,
.rq_offline = rq_offline_rt,
.post_schedule = post_schedule_rt,
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index e0e1299..6f90645 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1191,6 +1191,7 @@ struct sched_class {

void (*set_cpus_allowed)(struct task_struct *p,
const struct cpumask *newmask);
+ void (*post_set_cpus_allowed)(struct task_struct *p);

void (*rq_online)(struct rq *rq);
void (*rq_offline)(struct rq *rq);
--
1.9.1

2015-04-26 17:12:53

by Xunlei Pang

[permalink] [raw]
Subject: [RFC PATCH 3/6] sched/rt: Check to push FIFO current away at each tick

From: Xunlei Pang <[email protected]>

There may be some non-migratable tasks queued in the "run queue"
with the same priority as current which is FIFO and migratable,
so at each tick we can check and try to push current away and
give these tasks a chance of running(we don't do this for tasks
queued with lower priority).

Signed-off-by: Xunlei Pang <[email protected]>
---
kernel/sched/rt.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 7b76747..ddd5b19 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2314,10 +2314,17 @@ static void task_tick_rt(struct rq *rq, struct task_struct *p, int queued)

/*
* RR tasks need a special form of timeslice management.
- * FIFO tasks have no timeslices.
+ * FIFO tasks have no timeslices. But if p(current) is a
+ * FIFO task, try to push it away.
*/
- if (p->policy != SCHED_RR)
+ if (p->policy != SCHED_RR) {
+ if (p->nr_cpus_allowed > 1 &&
+ rq->rt.rt_nr_running > 1 &&
+ !test_tsk_need_resched(p))
+ check_preempt_equal_prio_common(rq);
+
return;
+ }

if (--p->rt.time_slice)
return;
--
1.9.1

2015-04-26 17:12:29

by Xunlei Pang

[permalink] [raw]
Subject: [RFC PATCH 4/6] lib/plist: Provide plist_add_head() for nodes with the same prio

From: Xunlei Pang <[email protected]>

If there're multiple nodes with the same prio as @node, currently
plist_add() will add @node behind all of them. Now we need to add
@node before all of these nodes for SMP RT scheduler.

This patch adds a common __plist_add() for adding @node before or
after existing nodes with the same prio, then adds plist_add_head()
and plist_add_tail() inline wrapper functions for convenient uses.

Finally, change plist_add() to an inline wrapper using plist_add_tail()
which has the same behaviour as before.

Reviewed-by: Dan Streetman <[email protected]>
Reviewed-by: Steven Rostedt <[email protected]>
Signed-off-by: Xunlei Pang <[email protected]>
---
include/linux/plist.h | 34 +++++++++++++++++++++++++++++++++-
lib/plist.c | 28 +++++++++++++++++++++++-----
2 files changed, 56 insertions(+), 6 deletions(-)

diff --git a/include/linux/plist.h b/include/linux/plist.h
index 9788360..d060f28 100644
--- a/include/linux/plist.h
+++ b/include/linux/plist.h
@@ -138,7 +138,39 @@ static inline void plist_node_init(struct plist_node *node, int prio)
INIT_LIST_HEAD(&node->node_list);
}

-extern void plist_add(struct plist_node *node, struct plist_head *head);
+extern void __plist_add(struct plist_node *node,
+ struct plist_head *head, bool is_head);
+
+/**
+ * plist_add_head - add @node to @head, before all existing same-prio nodes
+ *
+ * @node: The plist_node to be added to @head
+ * @head: The plist_head that @node is being added to
+ */
+static inline
+void plist_add_head(struct plist_node *node, struct plist_head *head)
+{
+ __plist_add(node, head, true);
+}
+
+/**
+ * plist_add_tail - add @node to @head, after all existing same-prio nodes
+ *
+ * @node: The plist_node to be added to @head
+ * @head: The plist_head that @node is being added to
+ */
+static inline
+void plist_add_tail(struct plist_node *node, struct plist_head *head)
+{
+ __plist_add(node, head, false);
+}
+
+static inline
+void plist_add(struct plist_node *node, struct plist_head *head)
+{
+ plist_add_tail(node, head);
+}
+
extern void plist_del(struct plist_node *node, struct plist_head *head);

extern void plist_requeue(struct plist_node *node, struct plist_head *head);
diff --git a/lib/plist.c b/lib/plist.c
index 3a30c53..c1ee2b0 100644
--- a/lib/plist.c
+++ b/lib/plist.c
@@ -66,12 +66,18 @@ static void plist_check_head(struct plist_head *head)
#endif

/**
- * plist_add - add @node to @head
+ * __plist_add - add @node to @head
*
- * @node: &struct plist_node pointer
- * @head: &struct plist_head pointer
+ * @node: The plist_node to be added to @head
+ * @head: The plist_head that @node is being added to
+ * @is_head: True if adding to head of prio list, false otherwise
+ *
+ * For nodes of the same prio, @node will be added at the
+ * head of previously added nodes if @is_head is true, or
+ * it will be added at the tail of previously added nodes
+ * if @is_head is false.
*/
-void plist_add(struct plist_node *node, struct plist_head *head)
+void __plist_add(struct plist_node *node, struct plist_head *head, bool is_head)
{
struct plist_node *first, *iter, *prev = NULL;
struct list_head *node_next = &head->node_list;
@@ -96,8 +102,20 @@ void plist_add(struct plist_node *node, struct plist_head *head)
struct plist_node, prio_list);
} while (iter != first);

- if (!prev || prev->prio != node->prio)
+ if (!prev || prev->prio != node->prio) {
list_add_tail(&node->prio_list, &iter->prio_list);
+ } else if (is_head) {
+ /*
+ * prev has the same priority as the node that is being
+ * added. It is also the first node for this priority,
+ * but the new node needs to be added ahead of it.
+ * To accomplish this, replace prev in the prio_list
+ * with node. Then set node_next to prev->node_list so
+ * that the new node gets added before prev and not iter.
+ */
+ list_replace_init(&prev->prio_list, &node->prio_list);
+ node_next = &prev->node_list;
+ }
ins_node:
list_add_tail(&node->node_list, node_next);

--
1.9.1

2015-04-26 17:13:49

by Xunlei Pang

[permalink] [raw]
Subject: [RFC PATCH 5/6] sched/rt: Fix wrong SMP scheduler behavior for equal prio cases

From: Xunlei Pang <[email protected]>

We know, there are two main queues each cpu for RT scheduler:
Let's call them "run queue" and "pushable queue" respectively.

For RT tasks, the scheduler uses "plist" to manage the pushable queue,
so when there are multiple tasks queued at the same priority, they get
queued in the strict FIFO order.

Currently, when an rt task gets queued, it is put to the head or the
tail of its "run queue" at the same priority according to different
scenarios. Then if it is migratable, it will also and always be put to
the tail of its "pushable queue" at the same priority.

For one cpu, assuming initially it has some migratable tasks queued
at the same priority as current(RT) both in "run queue" and "pushable
queue" in the same order. At some time, when current gets preempted, it
will be put behind these tasks in the "pushable queue", while it still
stays ahead of these tasks in the "run queue". Afterwards, if there comes
a pull from other cpu or a push from local cpu, the task behind current
in the "run queue" will be removed from the "pushable queue" and gets
running, as the global rt scheduler fetches tasks from the head of the
"pushable queue" to do pulling or pushing.

Obviously, to maintain the same order between the two queues, when current
is preempted(not involving re-queue in the "run queue"), we want to put it
ahead of all those tasks queued at the same priority in the "pushable queue".

So, if a task is running and gets preempted by a higher priority
task or even with same priority for migrating, this patch ensures
that it is put ahead of any existing task with the same priority in
the "pushable queue".

The dealing logic used here:
- Add a new variable named "rt_preempt"(define a new flag named
RT_PREEMPT_QUEUEAHEAD for it) in task_struct, used by RT.
- When doing preempted resched_curr() for current RT, set the flag.
Create a new resched_curr_preempted_rt() for this function, and
replace all the possible resched_curr() used for rt preemption with
resched_curr_preempted_rt().
- In put_prev_task_rt(), test RT_PREEMPT_QUEUEAHEAD if set, enqueue
the task ahead in the "pushable queue".

Signed-off-by: Xunlei Pang <[email protected]>
---
include/linux/sched.h | 5 +++
include/linux/sched/rt.h | 16 ++++++++
kernel/sched/core.c | 6 ++-
kernel/sched/rt.c | 96 ++++++++++++++++++++++++++++++++++++++++++------
4 files changed, 110 insertions(+), 13 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f74d4cc..24e0f72 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1321,6 +1321,11 @@ struct task_struct {
const struct sched_class *sched_class;
struct sched_entity se;
struct sched_rt_entity rt;
+
+#ifdef CONFIG_SMP
+ unsigned long rt_preempt; /* Used by rt */
+#endif
+
#ifdef CONFIG_CGROUP_SCHED
struct task_group *sched_task_group;
#endif
diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index 6341f5b..69e3c82 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -15,6 +15,22 @@ static inline int rt_task(struct task_struct *p)
return rt_prio(p->prio);
}

+struct rq;
+
+#ifdef CONFIG_SMP
+extern void resched_curr_preempted_rt(struct rq *rq);
+
+static inline void resched_curr_preempted(struct rq *rq)
+{
+ resched_curr_preempted_rt(rq);
+}
+#else
+static inline void resched_curr_preempted(struct rq *rq)
+{
+ rsched_curr(rq);
+}
+#endif
+
#ifdef CONFIG_RT_MUTEXES
extern int rt_mutex_getprio(struct task_struct *p);
extern void rt_mutex_setprio(struct task_struct *p, int prio);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index dc646a6..64a1603 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1002,7 +1002,7 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
if (class == rq->curr->sched_class)
break;
if (class == p->sched_class) {
- resched_curr(rq);
+ resched_curr_preempted(rq);
break;
}
}
@@ -1833,6 +1833,10 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)

INIT_LIST_HEAD(&p->rt.run_list);

+#ifdef CONFIG_SMP
+ p->rt_preempt = 0;
+#endif
+
#ifdef CONFIG_PREEMPT_NOTIFIERS
INIT_HLIST_HEAD(&p->preempt_notifiers);
#endif
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index ddd5b19..e7d66eb 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -254,8 +254,33 @@ int alloc_rt_sched_group(struct task_group *tg, struct task_group *parent)
}
#endif /* CONFIG_RT_GROUP_SCHED */

+
#ifdef CONFIG_SMP

+#define RT_PREEMPT_QUEUEAHEAD 1UL
+
+/*
+ * p(current) was preempted, and to be put ahead of
+ * any task with the same priority in pushable queue.
+ */
+static inline bool rt_preempted(struct task_struct *p)
+{
+ return !!(p->rt_preempt & RT_PREEMPT_QUEUEAHEAD);
+}
+
+static inline void clear_rt_preempt(struct task_struct *p)
+{
+ p->rt_preempt = 0;
+}
+
+void resched_curr_preempted_rt(struct rq *rq)
+{
+ if (rt_task(rq->curr))
+ rq->curr->rt_preempt |= RT_PREEMPT_QUEUEAHEAD;
+
+ resched_curr(rq);
+}
+
static int pull_rt_task(struct rq *this_rq);

static inline bool need_pull_rt_task(struct rq *rq, struct task_struct *prev)
@@ -359,17 +384,32 @@ static inline void set_post_schedule(struct rq *rq)
rq->post_schedule = has_pushable_tasks(rq);
}

-static void enqueue_pushable_task(struct rq *rq, struct task_struct *p)
+static void
+__enqueue_pushable_task(struct rq *rq, struct task_struct *p, bool head)
{
plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks);
plist_node_init(&p->pushable_tasks, p->prio);
- plist_add(&p->pushable_tasks, &rq->rt.pushable_tasks);
+ if (head)
+ plist_add_head(&p->pushable_tasks, &rq->rt.pushable_tasks);
+ else
+ plist_add_tail(&p->pushable_tasks, &rq->rt.pushable_tasks);

/* Update the highest prio pushable task */
if (p->prio < rq->rt.highest_prio.next)
rq->rt.highest_prio.next = p->prio;
}

+static inline
+void enqueue_pushable_task_preempted(struct rq *rq, struct task_struct *curr)
+{
+ __enqueue_pushable_task(rq, curr, true);
+}
+
+static inline void enqueue_pushable_task(struct rq *rq, struct task_struct *p)
+{
+ __enqueue_pushable_task(rq, p, false);
+}
+
static void dequeue_pushable_task(struct rq *rq, struct task_struct *p)
{
plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks);
@@ -385,6 +425,25 @@ static void dequeue_pushable_task(struct rq *rq, struct task_struct *p)

#else

+static inline bool rt_preempted(struct task_struct *p)
+{
+ return false;
+}
+
+static inline void clear_rt_preempt(struct task_struct *p)
+{
+}
+
+static inline void resched_curr_preempted_rt(struct rq *rq)
+{
+ resched_curr(rq);
+}
+
+static inline
+void enqueue_pushable_task_preempted(struct rq *rq, struct task_struct *p)
+{
+}
+
static inline void enqueue_pushable_task(struct rq *rq, struct task_struct *p)
{
}
@@ -489,7 +548,7 @@ static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
enqueue_rt_entity(rt_se, false);

if (rt_rq->highest_prio.curr < curr->prio)
- resched_curr(rq);
+ resched_curr_preempted_rt(rq);
}
}

@@ -967,7 +1026,7 @@ static void update_curr_rt(struct rq *rq)
raw_spin_lock(&rt_rq->rt_runtime_lock);
rt_rq->rt_time += delta_exec;
if (sched_rt_runtime_exceeded(rt_rq))
- resched_curr(rq);
+ resched_curr_preempted_rt(rq);
raw_spin_unlock(&rt_rq->rt_runtime_lock);
}
}
@@ -1409,7 +1468,7 @@ static void check_preempt_equal_prio_common(struct rq *rq)
* to try and push current away.
*/
requeue_task_rt(rq, next, 1);
- resched_curr(rq);
+ resched_curr_preempted_rt(rq);
}

static inline
@@ -1434,7 +1493,7 @@ void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
static void check_preempt_curr_rt(struct rq *rq, struct task_struct *p, int flags)
{
if (p->prio < rq->curr->prio) {
- resched_curr(rq);
+ resched_curr_preempted_rt(rq);
return;
}

@@ -1544,8 +1603,21 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
* The previous task needs to be made eligible for pushing
* if it is still active
*/
- if (on_rt_rq(&p->rt) && p->nr_cpus_allowed > 1)
- enqueue_pushable_task(rq, p);
+ if (on_rt_rq(&p->rt) && p->nr_cpus_allowed > 1) {
+ /*
+ * When put_prev_task_rt() is called by
+ * pick_next_task_rt(), if the current rt task
+ * is being preempted, to maintain FIFO, it must
+ * stay ahead of any other task that is queued
+ * at the same priority.
+ */
+ if (rt_preempted(p))
+ enqueue_pushable_task_preempted(rq, p);
+ else
+ enqueue_pushable_task(rq, p);
+ }
+
+ clear_rt_preempt(p);
}

#ifdef CONFIG_SMP
@@ -1764,7 +1836,7 @@ retry:
* just reschedule current.
*/
if (unlikely(next_task->prio < rq->curr->prio)) {
- resched_curr(rq);
+ resched_curr_preempted_rt(rq);
return 0;
}

@@ -1811,7 +1883,7 @@ retry:
activate_task(lowest_rq, next_task, 0);
ret = 1;

- resched_curr(lowest_rq);
+ resched_curr_preempted_rt(lowest_rq);

double_unlock_balance(rq, lowest_rq);

@@ -2236,7 +2308,7 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p)
check_resched = 0;
#endif /* CONFIG_SMP */
if (check_resched && p->prio < rq->curr->prio)
- resched_curr(rq);
+ resched_curr_preempted_rt(rq);
}
}

@@ -2278,7 +2350,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p, int oldprio)
* then reschedule.
*/
if (p->prio < rq->curr->prio)
- resched_curr(rq);
+ resched_curr_preempted_rt(rq);
}
}

--
1.9.1

2015-04-26 17:12:48

by Xunlei Pang

[permalink] [raw]
Subject: [RFC PATCH 6/6] sched/rt: Requeue p back if the preemption of check_preempt_equal_prio_common() failed

From: Xunlei Pang <[email protected]>

In check_preempt_equal_prio_common(), it requeues "next" ahead
in the "run queue" and want to push current away. But when doing
the actual pushing, if the system state changes, the pushing may
fail as a result.

In this case, p finally becomes the new current and gets running,
while previous current was queued back waiting in the same "run
queue". This broke FIFO.

This patch adds a flag named RT_PREEMPT_PUSHAWAY for task_struct::
rt_preempt, sets it when doing check_preempt_equal_prio_common(),
and clears it if current is away(when it is dequeued). Thus we can
test this flag in p's post_schedule_rt() to judge if the pushing
has happened. If the pushing failed, requeue previous current back
to the head of its "run queue" and start a rescheduling.

Signed-off-by: Xunlei Pang <[email protected]>
---
kernel/sched/rt.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 67 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index e7d66eb..94789f1 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -258,6 +258,8 @@ int alloc_rt_sched_group(struct task_group *tg, struct task_group *parent)
#ifdef CONFIG_SMP

#define RT_PREEMPT_QUEUEAHEAD 1UL
+#define RT_PREEMPT_PUSHAWAY 2UL
+#define RT_PREEMPT_MASK 3UL

/*
* p(current) was preempted, and to be put ahead of
@@ -268,6 +270,22 @@ static inline bool rt_preempted(struct task_struct *p)
return !!(p->rt_preempt & RT_PREEMPT_QUEUEAHEAD);
}

+static inline struct task_struct *rt_preempting_target(struct task_struct *p)
+{
+ return (struct task_struct *) (p->rt_preempt & ~RT_PREEMPT_MASK);
+}
+
+/*
+ * p(new current) is preempting and pushing previous current away.
+ */
+static inline bool rt_preempting(struct task_struct *p)
+{
+ if ((p->rt_preempt & RT_PREEMPT_PUSHAWAY) && rt_preempting_target(p))
+ return true;
+
+ return false;
+}
+
static inline void clear_rt_preempt(struct task_struct *p)
{
p->rt_preempt = 0;
@@ -375,13 +393,17 @@ static inline int has_pushable_tasks(struct rq *rq)
return !plist_head_empty(&rq->rt.pushable_tasks);
}

-static inline void set_post_schedule(struct rq *rq)
+static inline void set_post_schedule(struct rq *rq, struct task_struct *p)
{
- /*
- * We detect this state here so that we can avoid taking the RQ
- * lock again later if there is no need to push
- */
- rq->post_schedule = has_pushable_tasks(rq);
+ if (rt_preempting(p))
+ /* Forced post schedule */
+ rq->post_schedule = 1;
+ else
+ /*
+ * We detect this state here so that we can avoid taking
+ * the RQ lock again later if there is no need to push
+ */
+ rq->post_schedule = has_pushable_tasks(rq);
}

static void
@@ -430,6 +452,11 @@ static inline bool rt_preempted(struct task_struct *p)
return false;
}

+static inline bool rt_preempting(struct task_struct *p)
+{
+ return false;
+}
+
static inline void clear_rt_preempt(struct task_struct *p)
{
}
@@ -472,7 +499,7 @@ static inline int pull_rt_task(struct rq *this_rq)
return 0;
}

-static inline void set_post_schedule(struct rq *rq)
+static inline void set_post_schedule(struct rq *rq, struct task_struct *p)
{
}
#endif /* CONFIG_SMP */
@@ -1330,6 +1357,7 @@ static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int flags)
dequeue_rt_entity(rt_se);

dequeue_pushable_task(rq, p);
+ clear_rt_preempt(p);
}

/*
@@ -1468,6 +1496,11 @@ static void check_preempt_equal_prio_common(struct rq *rq)
* to try and push current away.
*/
requeue_task_rt(rq, next, 1);
+
+ get_task_struct(curr);
+ curr->rt_preempt |= RT_PREEMPT_PUSHAWAY;
+ next->rt_preempt = (unsigned long) curr;
+ next->rt_preempt |= RT_PREEMPT_PUSHAWAY;
resched_curr_preempted_rt(rq);
}

@@ -1590,7 +1623,7 @@ pick_next_task_rt(struct rq *rq, struct task_struct *prev)
/* The running task is never eligible for pushing */
dequeue_pushable_task(rq, p);

- set_post_schedule(rq);
+ set_post_schedule(rq, p);

return p;
}
@@ -2151,6 +2184,32 @@ skip:
static void post_schedule_rt(struct rq *rq)
{
push_rt_tasks(rq);
+
+ if (rt_preempting(current)) {
+ struct task_struct *target;
+
+ current->rt_preempt = 0;
+ target = rt_preempting_target(current);
+ if (!(target->rt_preempt & RT_PREEMPT_PUSHAWAY))
+ goto out;
+
+ /*
+ * target still has RT_PREEMPT_PUSHAWAY set which
+ * means it wasn't pushed away successfully if it
+ * is still on this rq. thus restore former status
+ * of current and target if so.
+ */
+ if (!task_on_rq_queued(target) ||
+ task_cpu(target) != rq->cpu)
+ goto out;
+
+ /* target is previous current, requeue it back ahead. */
+ requeue_task_rt(rq, target, 1);
+ /* Let's preempt current, loop back to __schedule(). */
+ resched_curr_preempted_rt(rq);
+out:
+ put_task_struct(target);
+ }
}

/*
--
1.9.1

2015-04-26 18:13:05

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC PATCH 3/6] sched/rt: Check to push FIFO current away at each tick

On Mon, 27 Apr 2015 01:10:55 +0800
Xunlei Pang <[email protected]> wrote:

> From: Xunlei Pang <[email protected]>
>
> There may be some non-migratable tasks queued in the "run queue"
> with the same priority as current which is FIFO and migratable,
> so at each tick we can check and try to push current away and
> give these tasks a chance of running(we don't do this for tasks
> queued with lower priority).

Why do we care?

This patch adds overhead to the tick for apparently no benefit. If you
have two tasks of the same priority where one is migrateable and the
other is not, then that's your problem. There's no specification that I
know of that states we need to push the currently running task off to
another CPU just so that we can run a pinned one. What happens if the
other CPU we pushed the rt task to, suddenly gets a pinned task
scheduled there, we end up pushing that rt task again. This is not the
kernel's problem.

The simple solution is to make sure that migratable RT tasks are at
different priorities than pinned RT tasks. There, problem solved.

-- Steve


>
> Signed-off-by: Xunlei Pang <[email protected]>

2015-04-26 18:33:09

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC PATCH 1/6] sched/rt: Provide new check_preempt_equal_prio_common()

On Mon, 27 Apr 2015 01:10:53 +0800
Xunlei Pang <[email protected]> wrote:

> From: Xunlei Pang <[email protected]>
>
> When p is queued, there may be other tasks already queued at the
> same priority in the "run queue", so we should peek the most front
> one to do the equal priority preemption.
>
> This patch modifies check_preempt_equal_prio() and provides new
> check_preempt_equal_prio_common() to do the common preemption.

This change log fails to explain why this is needed.

>
> There are also other cases to be added calling the new interface
> in the following patches.

A change log should stand on its own, and not just say, other patches
will use it. It can say that, if it fully explains why the change is
needed.

Also note, you are missing a 0/6 email that explains the entire
rational of the patch set, such that people understand more what you
intend to accomplish. We have no idea what your overall goal is.

-- Steve

>
> Signed-off-by: Xunlei Pang <[email protected]>
> ---
> kernel/sched/rt.c | 70 ++++++++++++++++++++++++++++++++++++++++++-------------
> 1 file changed, 54 insertions(+), 16 deletions(-)
>