LinuxLists.cc - [RFC PATCH RESEND 0/4] Maintain the FIFO order for same priority RT tasks

2015-04-27 06:50:15

Subject: [RFC PATCH RESEND 0/4] Maintain the FIFO order for same priority RT tasks

From: Xunlei Pang <[email protected]>

1. TWO FIFO QUEUES
We know, there are two main queues each cpu for RT SMP scheduler. Let's
name them "run queue" and "pushable queue" respectively.

"run queue" is from the intra-cpu's perspective, while "pushable queue" is
from the inter-cpu's perspective. "pushable queue" also has a very close
relationship with task affinity. The two queues separately maintain their
FIFO order for tasks queued at the same priority.

If we got a wrong FIFO order of any of the two queues, we say it broke FIFO.

2. THE WAY THEY WORK
Currently, when an rt task gets queued, it can be put to the head or
tail of its "run queue" at the same priority according to different
scenarios. Further if it is migratable, it will also and always be put
to the tail of its "pushable queue" at the same priority because "plist"
is used to manage this queue in strict FIFO order.

a) If a task got enqueued or dequeued, it got added to or removed from the
"run queue", and also "pushable queue"(added to tail) if it is migratable.

b) If a runnable task acquires cpu(current task), it stays in the front of
the "run queue", but got removed from the "pushable queue".

c) If a runnable task releases cpu, it stays in the "run queue"(if it
releases cpu involuntarily, stays just behind new current in this queue;
if it releases cpu voluntarily like yield, usually was requeued behind
other tasks queued at the same priority in this queue), and got added to
the tail of its "pushable queue" at the same priority if it is migratable.

d) If a tasks's affinity is changed from one cpu to multiple cpus, it got
added to the tail of "pushable queue" at the same priority.

e) If a tasks's affinity is changed from multiple cpus to one cpu, it got
removed from the "pushable queue".

3. PROBLEMATIC CASES
Unfortunately, in current kernel implementations, as far as I know, c)
described in "2. THE WAY THEY WORK" have some cases that broke FIFO of
one of the two queues for tasks queued at the same priority.

- Case 1: current task gets preempted by a higher priority task
current will be enqueued behind other tasks with the same priority in the
"pushable queue", while it actually stays ahead of these tasks in the "run
queue". Afterwards, if there comes a pull from other cpu or a push from
local cpu, the task behind current in the "run queue" will be removed from
the "pushable queue" and gets running, as the global rt scheduler fetches
tasks from the head of the "pushable queue" to do the pulling or pushing.
The kernel broke FIFO in this case.

current task gets preempted by an equal priority task:
This is done in check_preempt_equal_prio(), and can be divided into 3 sub cases.
- Case 2 : p is the only one has the same priority as current.
p preempts current successfully, but the kernel still had the
same problem as in Case 1 during the preempting.
- Case 3 : p is not the only one has the same priority as current, let's
say q is already queued before p. so, in this case we should
choose q to do the preempting, not p. So the kernel broke FIFO in
this case.
- Case 4 : p is going to preempt current, but when doing the actual preempting,
current isn't pushed away due to the change of system status. so
eventually p becomes the new current and gets running, while
current is queued back waiting in the same "run queue". Obviously,
the kernel broke FIFO in this case.

NOTE: when a task's affinity is changed and it becomes migratable(see d) described
in "2. THE WAY THEY WORK"), in current kernel implementation it will be queued
behind other tasks with the same priority in the "pushable queue". This may seems
contratry to the corresponding order in the "run queue". But from "pushable queue"'s
prospective, when a task's affinity is changed and further got removed from or
added to the "pushable queue" because of that, it means requeuing. In my view, I
don't think this broke FIFO of the "pushable queue". Any discussion?

But for "case 1" and "case 2", the task eventually wasn't got removed from or added
to the "pushable queue" due to affinity changing, so its "pushable queue" FIFO
order should also stays unchanged, and must be the same as its "run queue" order.
So we say "case 1" and "case 2" broke FIFO of its "pushable queue".

4. SUMMARY
case 1 : broke its "pushable queue" FIFO order.
case 2 : broke its "pushable queue" FIFO order.
case 3 : broke its "run queue" FIFO order.
case 4 : broke its "run queue" FIFO order.

5. PATCHSET
Patch 1 : Fix "Case 3".
Patch 2 : Prerequisite for Patch 3 - Provide plist_add_head() for "pushable queue".
Patch 3 : Fix "Case 1" and "Case 2".
Patch 4 : Fix "Case 4".

Xunlei Pang (4):
sched/rt: Modify check_preempt_equal_prio() for multiple tasks queued
at the same priority
lib/plist: Provide plist_add_head() for nodes with the same prio
sched/rt: Fix wrong SMP scheduler behavior for equal prio cases
sched/rt: Requeue p back if the preemption initiated by
check_preempt_equal_prio_common() failed

include/linux/plist.h | 34 ++++++-
include/linux/sched.h | 5 +
include/linux/sched/rt.h | 16 +++
kernel/sched/core.c | 6 +-
kernel/sched/rt.c | 253 ++++++++++++++++++++++++++++++++++++++++-------
lib/plist.c | 28 +++++-
6 files changed, 299 insertions(+), 43 deletions(-)

--
1.9.1

2015-04-27 06:50:11

by Xunlei Pang

[permalink] [raw]

Subject: [RFC PATCH RESEND 1/4] sched/rt: Modify check_preempt_equal_prio() for multiple tasks queued at the same priority

From: Xunlei Pang <[email protected]>

In check_preempt_equal_prio(), when p is queued, there may be other
tasks already queued at the same priority in the "run queue", so we
should peek the most front one to do the preemption, not the p.

This patch modified it and moved the preemption job to a new function
named check_preempt_equal_prio_common() to make the logic clearer.

Signed-off-by: Xunlei Pang <[email protected]>
---
kernel/sched/rt.c | 70 ++++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 54 insertions(+), 16 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 575da76..0c0f4df 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1366,33 +1366,66 @@ out:
return cpu;
}

-static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
+static struct task_struct *peek_next_task_rt(struct rq *rq);
+
+static void check_preempt_equal_prio_common(struct rq *rq)
{
+ struct task_struct *curr = rq->curr;
+ struct task_struct *next;
+
+ /* Current can't be migrated, useless to reschedule */
+ if (curr->nr_cpus_allowed == 1 ||
+ !cpupri_find(&rq->rd->cpupri, curr, NULL))
+ return;
+
/*
- * Current can't be migrated, useless to reschedule,
- * let's hope p can move out.
+ * Can we find any task with the same priority as
+ * curr? To accomplish this, firstly requeue curr
+ * to the tail, then peek next, finally put curr
+ * back to the head if a different task was peeked.
*/
- if (rq->curr->nr_cpus_allowed == 1 ||
- !cpupri_find(&rq->rd->cpupri, rq->curr, NULL))
+ requeue_task_rt(rq, curr, 0);
+ next = peek_next_task_rt(rq);
+ if (next == curr)
+ return;
+
+ requeue_task_rt(rq, curr, 1);
+
+ if (next->prio != curr->prio)
return;

/*
- * p is migratable, so let's not schedule it and
- * see if it is pushed or pulled somewhere else.
+ * Got the right "next" queued with the same priority
+ * as current. If next is migratable, don't schedule
+ * it as it will be pushed or pulled somewhere else.
*/
- if (p->nr_cpus_allowed != 1
- && cpupri_find(&rq->rd->cpupri, p, NULL))
+ if (next->nr_cpus_allowed != 1 &&
+ cpupri_find(&rq->rd->cpupri, next, NULL))
return;

/*
* There appears to be other cpus that can accept
- * current and none to run 'p', so lets reschedule
- * to try and push current away:
+ * current and none to run next, so lets reschedule
+ * to try and push current away.
*/
- requeue_task_rt(rq, p, 1);
+ requeue_task_rt(rq, next, 1);
resched_curr(rq);
}

+static inline
+void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
+{
+ /*
+ * p is migratable, so let's not schedule it and
+ * see if it is pushed or pulled somewhere else.
+ */
+ if (p->nr_cpus_allowed != 1 &&
+ cpupri_find(&rq->rd->cpupri, p, NULL))
+ return;
+
+ check_preempt_equal_prio_common(rq);
+}
+
#endif /* CONFIG_SMP */

/*
@@ -1440,10 +1473,9 @@ static struct sched_rt_entity *pick_next_rt_entity(struct rq *rq,
return next;
}

-static struct task_struct *_pick_next_task_rt(struct rq *rq)
+static struct task_struct *peek_next_task_rt(struct rq *rq)
{
struct sched_rt_entity *rt_se;
- struct task_struct *p;
struct rt_rq *rt_rq = &rq->rt;

do {
@@ -1452,9 +1484,15 @@ static struct task_struct *_pick_next_task_rt(struct rq *rq)
rt_rq = group_rt_rq(rt_se);
} while (rt_rq);

- p = rt_task_of(rt_se);
- p->se.exec_start = rq_clock_task(rq);
+ return rt_task_of(rt_se);
+}

+static inline struct task_struct *_pick_next_task_rt(struct rq *rq)
+{
+ struct task_struct *p;
+
+ p = peek_next_task_rt(rq);
+ p->se.exec_start = rq_clock_task(rq);
return p;
}

--
1.9.1

2015-04-27 06:50:38

by Xunlei Pang

[permalink] [raw]

Subject: [RFC PATCH RESEND 2/4] lib/plist: Provide plist_add_head() for nodes with the same prio

From: Xunlei Pang <[email protected]>

If there're multiple nodes with the same prio as @node, currently
plist_add() will add @node behind all of them. Now we need to add
@node before all of these nodes for SMP RT scheduler.

This patch adds a common __plist_add() for adding @node before or
after existing nodes with the same prio, then adds plist_add_head()
and plist_add_tail() inline wrapper functions for convenient uses.

Finally, change plist_add() to an inline wrapper using plist_add_tail()
which has the same behaviour as before.

Reviewed-by: Dan Streetman <[email protected]>
Reviewed-by: Steven Rostedt <[email protected]>
Signed-off-by: Xunlei Pang <[email protected]>
---
include/linux/plist.h | 34 +++++++++++++++++++++++++++++++++-
lib/plist.c | 28 +++++++++++++++++++++++-----
2 files changed, 56 insertions(+), 6 deletions(-)

diff --git a/include/linux/plist.h b/include/linux/plist.h
index 9788360..d060f28 100644
--- a/include/linux/plist.h
+++ b/include/linux/plist.h
@@ -138,7 +138,39 @@ static inline void plist_node_init(struct plist_node *node, int prio)
INIT_LIST_HEAD(&node->node_list);
}

-extern void plist_add(struct plist_node *node, struct plist_head *head);
+extern void __plist_add(struct plist_node *node,
+ struct plist_head *head, bool is_head);
+
+/**
+ * plist_add_head - add @node to @head, before all existing same-prio nodes
+ *
+ * @node: The plist_node to be added to @head
+ * @head: The plist_head that @node is being added to
+ */
+static inline
+void plist_add_head(struct plist_node *node, struct plist_head *head)
+{
+ __plist_add(node, head, true);
+}
+
+/**
+ * plist_add_tail - add @node to @head, after all existing same-prio nodes
+ *
+ * @node: The plist_node to be added to @head
+ * @head: The plist_head that @node is being added to
+ */
+static inline
+void plist_add_tail(struct plist_node *node, struct plist_head *head)
+{
+ __plist_add(node, head, false);
+}
+
+static inline
+void plist_add(struct plist_node *node, struct plist_head *head)
+{
+ plist_add_tail(node, head);
+}
+
extern void plist_del(struct plist_node *node, struct plist_head *head);

extern void plist_requeue(struct plist_node *node, struct plist_head *head);
diff --git a/lib/plist.c b/lib/plist.c
index 3a30c53..c1ee2b0 100644
--- a/lib/plist.c
+++ b/lib/plist.c
@@ -66,12 +66,18 @@ static void plist_check_head(struct plist_head *head)
#endif

/**
- * plist_add - add @node to @head
+ * __plist_add - add @node to @head
*
- * @node: &struct plist_node pointer
- * @head: &struct plist_head pointer
+ * @node: The plist_node to be added to @head
+ * @head: The plist_head that @node is being added to
+ * @is_head: True if adding to head of prio list, false otherwise
+ *
+ * For nodes of the same prio, @node will be added at the
+ * head of previously added nodes if @is_head is true, or
+ * it will be added at the tail of previously added nodes
+ * if @is_head is false.
*/
-void plist_add(struct plist_node *node, struct plist_head *head)
+void __plist_add(struct plist_node *node, struct plist_head *head, bool is_head)
{
struct plist_node *first, *iter, *prev = NULL;
struct list_head *node_next = &head->node_list;
@@ -96,8 +102,20 @@ void plist_add(struct plist_node *node, struct plist_head *head)
struct plist_node, prio_list);
} while (iter != first);

- if (!prev || prev->prio != node->prio)
+ if (!prev || prev->prio != node->prio) {
list_add_tail(&node->prio_list, &iter->prio_list);
+ } else if (is_head) {
+ /*
+ * prev has the same priority as the node that is being
+ * added. It is also the first node for this priority,
+ * but the new node needs to be added ahead of it.
+ * To accomplish this, replace prev in the prio_list
+ * with node. Then set node_next to prev->node_list so
+ * that the new node gets added before prev and not iter.
+ */
+ list_replace_init(&prev->prio_list, &node->prio_list);
+ node_next = &prev->node_list;
+ }
ins_node:
list_add_tail(&node->node_list, node_next);

--
1.9.1

2015-04-27 06:51:06

by Xunlei Pang

[permalink] [raw]

Subject: [RFC PATCH RESEND 3/4] sched/rt: Fix wrong SMP scheduler behavior for equal prio cases

From: Xunlei Pang <[email protected]>

We know, there are two main queues each cpu for RT scheduler:
Let's call them "run queue" and "pushable queue" respectively.

For RT tasks, the scheduler uses "plist" to manage the pushable queue,
so when there are multiple tasks queued at the same priority, they get
queued in the strict FIFO order.

Currently, when an rt task gets queued, it is put to the head or the
tail of its "run queue" at the same priority according to different
scenarios. Then if it is migratable, it will also and always be put to
the tail of its "pushable queue" at the same priority.

For one cpu, assuming initially it has some migratable tasks queued
at the same priority as current(RT) both in "run queue" and "pushable
queue" in the same order. At some time, when current gets preempted, it
will be put behind these tasks in the "pushable queue", while it still
stays ahead of these tasks in the "run queue". Afterwards, if there comes
a pull from other cpu or a push from local cpu, the task behind current
in the "run queue" will be removed from the "pushable queue" and gets
running, as the global rt scheduler fetches tasks from the head of the
"pushable queue" to do pulling or pushing.

Obviously, to maintain the right order for the two queues, when current
is preempted(not involving re-queue in the "run queue"), we want to put it
ahead of all those tasks queued at the same priority in the "pushable queue".

So, if a task is running and gets preempted by a higher priority
task or even with same priority for migrating, this patch ensures
that it is put ahead of any existing task with the same priority in
the "pushable queue".

The handling logic used here:
- Add a new variable named "rt_preempt"(define a new flag named
RT_PREEMPT_QUEUEAHEAD for it) in task_struct, used by RT.
- When doing preempted resched_curr() for current RT, set the flag.
Create a new resched_curr_preempted_rt() for this function, and
replace all the possible resched_curr() used for rt preemption with
resched_curr_preempted_rt().
- In put_prev_task_rt(), test RT_PREEMPT_QUEUEAHEAD if set, enqueue
the task ahead in the "pushable queue" and clear the flag.

Signed-off-by: Xunlei Pang <[email protected]>
---
include/linux/sched.h | 5 +++
include/linux/sched/rt.h | 16 ++++++++
kernel/sched/core.c | 6 ++-
kernel/sched/rt.c | 96 ++++++++++++++++++++++++++++++++++++++++++------
4 files changed, 110 insertions(+), 13 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f74d4cc..24e0f72 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1321,6 +1321,11 @@ struct task_struct {
const struct sched_class *sched_class;
struct sched_entity se;
struct sched_rt_entity rt;
+
+#ifdef CONFIG_SMP
+ unsigned long rt_preempt; /* Used by rt */
+#endif
+
#ifdef CONFIG_CGROUP_SCHED
struct task_group *sched_task_group;
#endif
diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index 6341f5b..69e3c82 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -15,6 +15,22 @@ static inline int rt_task(struct task_struct *p)
return rt_prio(p->prio);
}

+struct rq;
+
+#ifdef CONFIG_SMP
+extern void resched_curr_preempted_rt(struct rq *rq);
+
+static inline void resched_curr_preempted(struct rq *rq)
+{
+ resched_curr_preempted_rt(rq);
+}
+#else
+static inline void resched_curr_preempted(struct rq *rq)
+{
+ rsched_curr(rq);
+}
+#endif
+
#ifdef CONFIG_RT_MUTEXES
extern int rt_mutex_getprio(struct task_struct *p);
extern void rt_mutex_setprio(struct task_struct *p, int prio);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f9123a8..d13fc13 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1002,7 +1002,7 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
if (class == rq->curr->sched_class)
break;
if (class == p->sched_class) {
- resched_curr(rq);
+ resched_curr_preempted(rq);
break;
}
}
@@ -1833,6 +1833,10 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)

INIT_LIST_HEAD(&p->rt.run_list);

+#ifdef CONFIG_SMP
+ p->rt_preempt = 0;
+#endif
+
#ifdef CONFIG_PREEMPT_NOTIFIERS
INIT_HLIST_HEAD(&p->preempt_notifiers);
#endif
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 0c0f4df..7439121 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -254,8 +254,33 @@ int alloc_rt_sched_group(struct task_group *tg, struct task_group *parent)
}
#endif /* CONFIG_RT_GROUP_SCHED */

+
#ifdef CONFIG_SMP

+#define RT_PREEMPT_QUEUEAHEAD 1UL
+
+/*
+ * p(current) was preempted, and to be put ahead of
+ * any task with the same priority in pushable queue.
+ */
+static inline bool rt_preempted(struct task_struct *p)
+{
+ return !!(p->rt_preempt & RT_PREEMPT_QUEUEAHEAD);
+}
+
+static inline void clear_rt_preempted(struct task_struct *p)
+{
+ p->rt_preempt = 0;
+}
+
+void resched_curr_preempted_rt(struct rq *rq)
+{
+ if (rt_task(rq->curr))
+ rq->curr->rt_preempt |= RT_PREEMPT_QUEUEAHEAD;
+
+ resched_curr(rq);
+}
+
static int pull_rt_task(struct rq *this_rq);

static inline bool need_pull_rt_task(struct rq *rq, struct task_struct *prev)
@@ -359,17 +384,32 @@ static inline void set_post_schedule(struct rq *rq)
rq->post_schedule = has_pushable_tasks(rq);
}

-static void enqueue_pushable_task(struct rq *rq, struct task_struct *p)
+static void
+__enqueue_pushable_task(struct rq *rq, struct task_struct *p, bool head)
{
plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks);
plist_node_init(&p->pushable_tasks, p->prio);
- plist_add(&p->pushable_tasks, &rq->rt.pushable_tasks);
+ if (head)
+ plist_add_head(&p->pushable_tasks, &rq->rt.pushable_tasks);
+ else
+ plist_add_tail(&p->pushable_tasks, &rq->rt.pushable_tasks);

/* Update the highest prio pushable task */
if (p->prio < rq->rt.highest_prio.next)
rq->rt.highest_prio.next = p->prio;
}

+static inline
+void enqueue_pushable_task_preempted(struct rq *rq, struct task_struct *curr)
+{
+ __enqueue_pushable_task(rq, curr, true);
+}
+
+static inline void enqueue_pushable_task(struct rq *rq, struct task_struct *p)
+{
+ __enqueue_pushable_task(rq, p, false);
+}
+
static void dequeue_pushable_task(struct rq *rq, struct task_struct *p)
{
plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks);
@@ -385,6 +425,25 @@ static void dequeue_pushable_task(struct rq *rq, struct task_struct *p)

#else

+static inline bool rt_preempted(struct task_struct *p)
+{
+ return false;
+}
+
+static inline void clear_rt_preempted(struct task_struct *p)
+{
+}
+
+static inline void resched_curr_preempted_rt(struct rq *rq)
+{
+ resched_curr(rq);
+}
+
+static inline
+void enqueue_pushable_task_preempted(struct rq *rq, struct task_struct *p)
+{
+}
+
static inline void enqueue_pushable_task(struct rq *rq, struct task_struct *p)
{
}
@@ -489,7 +548,7 @@ static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
enqueue_rt_entity(rt_se, false);

if (rt_rq->highest_prio.curr < curr->prio)
- resched_curr(rq);
+ resched_curr_preempted_rt(rq);
}
}

@@ -967,7 +1026,7 @@ static void update_curr_rt(struct rq *rq)
raw_spin_lock(&rt_rq->rt_runtime_lock);
rt_rq->rt_time += delta_exec;
if (sched_rt_runtime_exceeded(rt_rq))
- resched_curr(rq);
+ resched_curr_preempted_rt(rq);
raw_spin_unlock(&rt_rq->rt_runtime_lock);
}
}
@@ -1409,7 +1468,7 @@ static void check_preempt_equal_prio_common(struct rq *rq)
* to try and push current away.
*/
requeue_task_rt(rq, next, 1);
- resched_curr(rq);
+ resched_curr_preempted_rt(rq);
}

static inline
@@ -1434,7 +1493,7 @@ void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
static void check_preempt_curr_rt(struct rq *rq, struct task_struct *p, int flags)
{
if (p->prio < rq->curr->prio) {
- resched_curr(rq);
+ resched_curr_preempted_rt(rq);
return;
}

@@ -1544,8 +1603,21 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
* The previous task needs to be made eligible for pushing
* if it is still active
*/
- if (on_rt_rq(&p->rt) && p->nr_cpus_allowed > 1)
- enqueue_pushable_task(rq, p);
+ if (on_rt_rq(&p->rt) && p->nr_cpus_allowed > 1) {
+ /*
+ * When put_prev_task_rt() is called by
+ * pick_next_task_rt(), if the current rt task
+ * is being preempted, to maintain FIFO, it must
+ * stay ahead of any other task that is queued
+ * at the same priority.
+ */
+ if (rt_preempted(p))
+ enqueue_pushable_task_preempted(rq, p);
+ else
+ enqueue_pushable_task(rq, p);
+ }
+
+ clear_rt_preempted(p);
}

#ifdef CONFIG_SMP
@@ -1764,7 +1836,7 @@ retry:
* just reschedule current.
*/
if (unlikely(next_task->prio < rq->curr->prio)) {
- resched_curr(rq);
+ resched_curr_preempted_rt(rq);
return 0;
}

@@ -1811,7 +1883,7 @@ retry:
activate_task(lowest_rq, next_task, 0);
ret = 1;

- resched_curr(lowest_rq);
+ resched_curr_preempted_rt(lowest_rq);

double_unlock_balance(rq, lowest_rq);

@@ -2213,7 +2285,7 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p)
check_resched = 0;
#endif /* CONFIG_SMP */
if (check_resched && p->prio < rq->curr->prio)
- resched_curr(rq);
+ resched_curr_preempted_rt(rq);
}
}

@@ -2255,7 +2327,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p, int oldprio)
* then reschedule.
*/
if (p->prio < rq->curr->prio)
- resched_curr(rq);
+ resched_curr_preempted_rt(rq);
}
}

--
1.9.1

2015-04-27 06:50:43

by Xunlei Pang

[permalink] [raw]

Subject: [RFC PATCH RESEND 4/4] sched/rt: Requeue p back if the preemption initiated by check_preempt_equal_prio_common() failed

From: Xunlei Pang <[email protected]>

In check_preempt_equal_prio_common(), it requeues "next" ahead
in the "run queue" and want to push current away. But when doing
the actual pushing, if the system state changes, the pushing may
fail as a result.

In this case, p finally becomes the new current and gets running,
while previous current was queued back waiting in the same "run
queue". This broke FIFO.

This patch adds a flag named RT_PREEMPT_PUSHAWAY for task_struct::
rt_preempt, sets it when doing check_preempt_equal_prio_common(),
and clears it if current is away(it will call dequeued). So we can
test this flag in p's post_schedule_rt() to judge if the pushing
has happened. If the pushing failed, requeue previous current back
to the head of its "run queue" and start a rescheduling.

Signed-off-by: Xunlei Pang <[email protected]>
---
kernel/sched/rt.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 79 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 7439121..d1cecd6 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -258,6 +258,8 @@ int alloc_rt_sched_group(struct task_group *tg, struct task_group *parent)
#ifdef CONFIG_SMP

#define RT_PREEMPT_QUEUEAHEAD 1UL
+#define RT_PREEMPT_PUSHAWAY 2UL
+#define RT_PREEMPT_MASK 3UL

/*
* p(current) was preempted, and to be put ahead of
@@ -273,6 +275,30 @@ static inline void clear_rt_preempted(struct task_struct *p)
p->rt_preempt = 0;
}

+static inline struct task_struct *rt_preempting_target(struct task_struct *p)
+{
+ return (struct task_struct *) (p->rt_preempt & ~RT_PREEMPT_MASK);
+}
+
+/*
+ * p(new current) is preempting and pushing previous current away.
+ */
+static inline bool rt_preempting(struct task_struct *p)
+{
+ if ((p->rt_preempt & RT_PREEMPT_PUSHAWAY) && rt_preempting_target(p))
+ return true;
+
+ return false;
+}
+
+static inline void clear_rt_preempting(struct task_struct *p)
+{
+ if (rt_preempting(p))
+ put_task_struct(rt_preempting_target(p));
+
+ p->rt_preempt = 0;
+}
+
void resched_curr_preempted_rt(struct rq *rq)
{
if (rt_task(rq->curr))
@@ -375,13 +401,17 @@ static inline int has_pushable_tasks(struct rq *rq)
return !plist_head_empty(&rq->rt.pushable_tasks);
}

-static inline void set_post_schedule(struct rq *rq)
+static inline void set_post_schedule(struct rq *rq, struct task_struct *p)
{
- /*
- * We detect this state here so that we can avoid taking the RQ
- * lock again later if there is no need to push
- */
- rq->post_schedule = has_pushable_tasks(rq);
+ if (rt_preempting(p))
+ /* Forced post schedule */
+ rq->post_schedule = 1;
+ else
+ /*
+ * We detect this state here so that we can avoid taking
+ * the RQ lock again later if there is no need to push
+ */
+ rq->post_schedule = has_pushable_tasks(rq);
}

static void
@@ -434,6 +464,15 @@ static inline void clear_rt_preempted(struct task_struct *p)
{
}

+static inline bool rt_preempting(struct task_struct *p)
+{
+ return false;
+}
+
+static inline void clear_rt_preempting(struct task_struct *p)
+{
+}
+
static inline void resched_curr_preempted_rt(struct rq *rq)
{
resched_curr(rq);
@@ -472,7 +511,7 @@ static inline int pull_rt_task(struct rq *this_rq)
return 0;
}

-static inline void set_post_schedule(struct rq *rq)
+static inline void set_post_schedule(struct rq *rq, struct task_struct *p)
{
}
#endif /* CONFIG_SMP */
@@ -1330,6 +1369,7 @@ static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int flags)
dequeue_rt_entity(rt_se);

dequeue_pushable_task(rq, p);
+ clear_rt_preempting(p);
}

/*
@@ -1468,6 +1508,11 @@ static void check_preempt_equal_prio_common(struct rq *rq)
* to try and push current away.
*/
requeue_task_rt(rq, next, 1);
+
+ get_task_struct(curr);
+ curr->rt_preempt |= RT_PREEMPT_PUSHAWAY;
+ next->rt_preempt = (unsigned long) curr;
+ next->rt_preempt |= RT_PREEMPT_PUSHAWAY;
resched_curr_preempted_rt(rq);
}

@@ -1590,7 +1635,7 @@ pick_next_task_rt(struct rq *rq, struct task_struct *prev)
/* The running task is never eligible for pushing */
dequeue_pushable_task(rq, p);

- set_post_schedule(rq);
+ set_post_schedule(rq, p);

return p;
}
@@ -2151,6 +2196,32 @@ skip:
static void post_schedule_rt(struct rq *rq)
{
push_rt_tasks(rq);
+
+ if (rt_preempting(current)) {
+ struct task_struct *target;
+
+ target = rt_preempting_target(current);
+ current->rt_preempt = 0;
+ if (!(target->rt_preempt & RT_PREEMPT_PUSHAWAY))
+ goto out;
+
+ /*
+ * target still has RT_PREEMPT_PUSHAWAY set which
+ * means it wasn't pushed away successfully if it
+ * is still on this rq. thus restore former status
+ * of current and target if so.
+ */
+ if (!task_on_rq_queued(target) ||
+ task_cpu(target) != rq->cpu)
+ goto out;
+
+ /* target is previous current, requeue it back ahead. */
+ requeue_task_rt(rq, target, 1);
+ /* Let's preempt current, loop back to __schedule(). */
+ resched_curr_preempted_rt(rq);
+out:
+ put_task_struct(target);
+ }
}

/*
--
1.9.1