LinuxLists.cc - [PATCH v2 1/2] sched/rt: Check to push task away when its affinity is changed

2015-05-05 11:57:55

Subject: [PATCH v2 1/2] sched/rt: Check to push task away when its affinity is changed

From: Xunlei Pang <[email protected]>

We may suffer from extra rt overload rq due to the affinity,
so when the affinity of any runnable rt task is changed, we
should check to trigger balancing, otherwise it will cause
some unnecessary delayed real-time response. Unfortunately,
current RT global scheduler does nothing about this.

For example: a 2-cpu system with two runnable FIFO tasks(same
rt_priority) bound on CPU0, let's name them rt1(running) and
rt2(runnable) respectively; CPU1 has no RTs. Then, someone sets
the affinity of rt2 to 0x3(i.e. CPU0 and CPU1), but after this,
rt2 still can't be scheduled enters schedule(), this
definitely causes some/big response latency for rt2.

This patch introduces a new sched_class::post_set_cpus_allowed()
for RT called after set_cpus_allowed_rt(). In this new function,
if the task is runnable but not running, it tries to push it away
once it got migratable.

The patch also solves a problem about move_queued_task() called
in set_cpus_allowed_ptr():
When a lower priorioty rt task got migrated due to its curr cpu
isn't in the new affinity mask, after move_queued_task() it will
miss the chance of pushing away, because check_preempt_curr()
called by move_queued_task() doens't set the "need resched flag"
for lower priority tasks.

Parts-suggested-by: Steven Rostedt <[email protected]>
Signed-off-by: Xunlei Pang <[email protected]>
---
v1->v2:
Removed cpupri_find(), as it will probably be executed in push_rt_tasks().

kernel/sched/core.c | 3 +++
kernel/sched/rt.c | 15 +++++++++++++++
kernel/sched/sched.h | 1 +
3 files changed, 19 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d13fc13..64a1603 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4773,6 +4773,9 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)

cpumask_copy(&p->cpus_allowed, new_mask);
p->nr_cpus_allowed = cpumask_weight(new_mask);
+
+ if (p->sched_class->post_set_cpus_allowed)
+ p->sched_class->post_set_cpus_allowed(p);
}

/*
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 8885b65..4176f33 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2280,6 +2280,20 @@ static void set_cpus_allowed_rt(struct task_struct *p,
update_rt_migration(&rq->rt);
}

+static void post_set_cpus_allowed_rt(struct task_struct *p)
+{
+ struct rq *rq;
+
+ if (!task_on_rq_queued(p))
+ return;
+
+ rq = task_rq(p);
+ if (!task_running(rq, p) &&
+ p->nr_cpus_allowed > 1 &&
+ !test_tsk_need_resched(rq->curr))
+ push_rt_tasks(rq);
+}
+
/* Assumes rq->lock is held */
static void rq_online_rt(struct rq *rq)
{
@@ -2494,6 +2508,7 @@ const struct sched_class rt_sched_class = {
.select_task_rq = select_task_rq_rt,

.set_cpus_allowed = set_cpus_allowed_rt,
+ .post_set_cpus_allowed = post_set_cpus_allowed_rt,
.rq_online = rq_online_rt,
.rq_offline = rq_offline_rt,
.post_schedule = post_schedule_rt,
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index e0e1299..6f90645 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1191,6 +1191,7 @@ struct sched_class {

void (*set_cpus_allowed)(struct task_struct *p,
const struct cpumask *newmask);
+ void (*post_set_cpus_allowed)(struct task_struct *p);

void (*rq_online)(struct rq *rq);
void (*rq_offline)(struct rq *rq);
--
1.9.1

2015-05-05 11:57:44

by Xunlei Pang

[permalink] [raw]

Subject: [PATCH v2 2/2] sched/rt: Remove redundant conditions from task_woken_rt()

From: Xunlei Pang <[email protected]>

- Remove "has_pushable_tasks(rq)".
Because for queued p, "!task_running(rq, p)" and "p->nr_cpus_allowed > 1"
already imply that "has_pushable_tasks(rq)" is true.

- Remove "!test_tsk_need_resched(rq->curr)".
The condtion mainly intends to ensure higher priority rt tasks won't be pushed
away. I can think of two reasons below for getting rid of it.
1) With following "rq->curr->prio <= p->prio", we still can guarantee that
purpose. "rq->curr->prio <= p->prio" implies the "need resched flag" wasn't
set by check_preempt_curr() except the one set by check_preempt_equal_prio()
for equal prio cases(In this case, if the condition is removed, it may result
in an extra push_rt_tasks(), but this doesn't cause the wrong logic, in fact
this extra push_rt_tasks() will probably return quickly for the case).

Addtionally, there're also cases the "need resched flag" got set before the
waking, with current implementation it needn't to push lower priority tasks
as the cpu will schedule, while it will do an extra pushing if the condition
is removed. But on the other hand, we can get a timely pushing for the woken
tasks after the condition is removed(better for the non-preemptible kernel).

2) With following condtion "rq->curr->nr_cpus_allowed < 2" which was added by
commit b3bc211cfe7d ("sched: Give CPU bound RT tasks preference"). But in the
scenario descibed in the commit, "need resched flag" was already set before in
check_preempt_curr(), thus "!test_tsk_need_resched(rq->curr)" is always false
which means with current implementation the commit is futile for task_woken_rt().
So, by removing this condition, we get the right logic.

Signed-off-by: Xunlei Pang <[email protected]>
---
v1->v2:
Improved the changelog.

kernel/sched/rt.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 4176f33..95b596b 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2232,8 +2232,6 @@ out:
static void task_woken_rt(struct rq *rq, struct task_struct *p)
{
if (!task_running(rq, p) &&
- !test_tsk_need_resched(rq->curr) &&
- has_pushable_tasks(rq) &&
p->nr_cpus_allowed > 1 &&
(dl_task(rq->curr) || rt_task(rq->curr)) &&
(rq->curr->nr_cpus_allowed < 2 ||
--
1.9.1

2015-05-05 12:10:16

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] sched/rt: Check to push task away when its affinity is changed

On Tue, May 05, 2015 at 07:56:07PM +0800, Xunlei Pang wrote:
> +++ b/kernel/sched/core.c
> @@ -4773,6 +4773,9 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
>
> cpumask_copy(&p->cpus_allowed, new_mask);
> p->nr_cpus_allowed = cpumask_weight(new_mask);
> +
> + if (p->sched_class->post_set_cpus_allowed)
> + p->sched_class->post_set_cpus_allowed(p);
> }
>
> /*
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 8885b65..4176f33 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -2280,6 +2280,20 @@ static void set_cpus_allowed_rt(struct task_struct *p,
> update_rt_migration(&rq->rt);
> }
>
> +static void post_set_cpus_allowed_rt(struct task_struct *p)
> +{
> + struct rq *rq;
> +
> + if (!task_on_rq_queued(p))
> + return;
> +
> + rq = task_rq(p);
> + if (!task_running(rq, p) &&
> + p->nr_cpus_allowed > 1 &&
> + !test_tsk_need_resched(rq->curr))
> + push_rt_tasks(rq);
> +}

Guys, this is disgusting. Please don't do these minimal effort hacks.

Either fix up all the classes with a trivial set_cpus_allowed() function
and make do_set_cpus_allowed() := p->sched_class->set_cpus_allowed().

Or just do the p->{nr_,}cpus_allowed assignments in
set_cpus_allowed_rt() and keep it all in the one callback.

2015-05-08 17:28:06

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] sched/rt: Check to push task away when its affinity is changed

On Tue, 5 May 2015 23:17:30 +0800
[email protected] wrote:

> > Or just do the p->{nr_,}cpus_allowed assignments in
> > set_cpus_allowed_rt() and keep it all in the one callback.
>
> Ok, thanks.
>
> How about this?

This is something more like I had in mind.

>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index d13fc13..c995a02 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4768,11 +4768,15 @@ static struct rq *move_queued_task(struct
> task_struct *p, int new_cpu)
>
> void do_set_cpus_allowed(struct task_struct *p, const struct cpumask
> *new_mask)
> {
> + bool updated = false;
> +
> if (p->sched_class->set_cpus_allowed)
> - p->sched_class->set_cpus_allowed(p, new_mask);
> + updated = p->sched_class->set_cpus_allowed(p, new_mask);
>
> - cpumask_copy(&p->cpus_allowed, new_mask);
> - p->nr_cpus_allowed = cpumask_weight(new_mask);
> + if (!updated) {
> + cpumask_copy(&p->cpus_allowed, new_mask);
> + p->nr_cpus_allowed = cpumask_weight(new_mask);
> + }

I'm fine with this if Peter is.

> }
>
> /*
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 5e95145..3baffb2 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1574,7 +1574,7 @@ static void task_woken_dl(struct rq *rq, struct
> task_struct *p)
> }
> }
>
> -static void set_cpus_allowed_dl(struct task_struct *p,
> +static bool set_cpus_allowed_dl(struct task_struct *p,
> const struct cpumask *new_mask)
> {
> struct rq *rq;
> @@ -1610,7 +1610,7 @@ static void set_cpus_allowed_dl(struct task_struct
> *p,
> * it is on the rq AND it is not throttled).
> */
> if (!on_dl_rq(&p->dl))
> - return;
> + return false;
>

I would think DEAD_LINE tasks would need the same "feature".

> weight = cpumask_weight(new_mask);
>
> @@ -1619,7 +1619,7 @@ static void set_cpus_allowed_dl(struct task_struct
> *p,
> * can migrate or not.
> */
> if ((p->nr_cpus_allowed > 1) == (weight > 1))
> - return;
> + return false;
>
> /*
> * The process used to be able to migrate OR it can now migrate
> @@ -1636,6 +1636,8 @@ static void set_cpus_allowed_dl(struct task_struct
> *p,
> }
>
> update_dl_migration(&rq->dl);
> +
> + return false;
> }
>
> /* Assumes rq->lock is held */
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 8885b65..9e7a4bb 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -2241,7 +2241,7 @@ static void task_woken_rt(struct rq *rq, struct
> task_struct *p)
> push_rt_tasks(rq);
> }
>
> -static void set_cpus_allowed_rt(struct task_struct *p,
> +static bool set_cpus_allowed_rt(struct task_struct *p,
> const struct cpumask *new_mask)
> {
> struct rq *rq;
> @@ -2250,18 +2250,18 @@ static void set_cpus_allowed_rt(struct task_struct
> *p,
> BUG_ON(!rt_task(p));
>
> if (!task_on_rq_queued(p))
> - return;
> + return false;
>
> weight = cpumask_weight(new_mask);
>
> + rq = task_rq(p);
> +
> /*
> * Only update if the process changes its state from whether it
> * can migrate or not.

Comment needs updating.

> */
> if ((p->nr_cpus_allowed > 1) == (weight > 1))
> - return;
> -
> - rq = task_rq(p);
> + goto check_push;
>
> /*
> * The process used to be able to migrate OR it can now migrate
> @@ -2278,6 +2278,18 @@ static void set_cpus_allowed_rt(struct task_struct
> *p,
> }
>
> update_rt_migration(&rq->rt);
> +
> +check_push:
> + if (weight > 1 && !task_running(rq, p) &&
> + !cpumask_subset(new_mask, &p->cpus_allowed)) {
> + /* Update new affinity for pushing */
> + cpumask_copy(&p->cpus_allowed, new_mask);
> + p->nr_cpus_allowed = weight;
> + push_rt_tasks(rq);
> + return true;
> + }
> +
> + return false;
> }
>
> /* Assumes rq->lock is held */
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index e0e1299..75f869b 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1189,7 +1189,8 @@ struct sched_class {
> void (*task_waking) (struct task_struct *task);
> void (*task_woken) (struct rq *this_rq, struct task_struct *task);
>
> - void (*set_cpus_allowed)(struct task_struct *p,
> + /* If p's affinity was updated by it, return true. Otherwise false
> */

/* Return true if p's affinity was updated, false otherwise */

-- Steve

> + bool (*set_cpus_allowed)(struct task_struct *p,
> const struct cpumask *newmask);
>
> void (*rq_online)(struct rq *rq);
>
> --------------------------------------------------------
> ZTE Information Security Notice: The information contained in this mail (and any attachment transmitted herewith) is privileged and confidential and is intended for the exclusive use of the addressee(s). If you are not an intended recipient, any disclosure, reproduction, distribution or other dissemination or use of the information contained is strictly prohibited. If you have received this mail in error, please delete it and notify us immediately.