2018-10-05 14:11:10

by Peng Hao

[permalink] [raw]
Subject: [PATCH v3] sched/rt : return accurate release rq lock info

find_lock_lowest_rq may or not releease rq lock when return
lowest_rq=NULL, but it is fuzzy.
If not releasing rq lock, it is unnecessary to re-call
pick_next_pushable_task.
When CONFIG_PREEMPT=n, not releasing rq lock and return
lowest_rq=null frequently happens in a simple test case:
Four different rt priority tasks run on limited two cpus.
Thanks for Steven Rostedt's advice.

Signed-off-by: Peng Hao <[email protected]>
---
kernel/sched/rt.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 2e2955a..be0fc43 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1754,7 +1754,7 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
!task_on_rq_queued(task))) {

double_unlock_balance(rq, lowest_rq);
- lowest_rq = NULL;
+ lowest_rq = RETRY_TASK;
break;
}
}
@@ -1830,7 +1830,9 @@ static int push_rt_task(struct rq *rq)

/* find_lock_lowest_rq locks the rq if found */
lowest_rq = find_lock_lowest_rq(next_task, rq);
- if (!lowest_rq) {
+ if (!lowest_rq)
+ goto out;
+ if (lowest_rq == RETRY_TASK) {
struct task_struct *task;
/*
* find_lock_lowest_rq releases rq->lock
--
1.8.3.1



2018-10-05 14:27:42

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v3] sched/rt : return accurate release rq lock info

On Sat, 6 Oct 2018 06:22:11 +0800
Peng Hao <[email protected]> wrote:

> find_lock_lowest_rq may or not releease rq lock when return
> lowest_rq=NULL, but it is fuzzy.
> If not releasing rq lock, it is unnecessary to re-call
> pick_next_pushable_task.
> When CONFIG_PREEMPT=n, not releasing rq lock and return
> lowest_rq=null frequently happens in a simple test case:
> Four different rt priority tasks run on limited two cpus.
> Thanks for Steven Rostedt's advice.
>
> Signed-off-by: Peng Hao <[email protected]>
> ---
> kernel/sched/rt.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 2e2955a..be0fc43 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1754,7 +1754,7 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
> !task_on_rq_queued(task))) {
>
> double_unlock_balance(rq, lowest_rq);
> - lowest_rq = NULL;
> + lowest_rq = RETRY_TASK;
> break;
> }
> }
> @@ -1830,7 +1830,9 @@ static int push_rt_task(struct rq *rq)
>
> /* find_lock_lowest_rq locks the rq if found */
> lowest_rq = find_lock_lowest_rq(next_task, rq);
> - if (!lowest_rq) {
> + if (!lowest_rq)
> + goto out;
> + if (lowest_rq == RETRY_TASK) {

This probably makes no difference, and can be blown off as just a
preference, but should this be:

if (!lowest_rq) {
goto out;
} else if (lowest_rq == RETRY_TASK) {

The logic is the same regardless, so it's really just a matter of taste.

That said:

Reviewed-by: Steven Rostedt (VMware) <[email protected]>

-- Steve


> struct task_struct *task;
> /*
> * find_lock_lowest_rq releases rq->lock


2018-10-05 15:48:23

by Andrea Parri

[permalink] [raw]
Subject: Re: [PATCH v3] sched/rt : return accurate release rq lock info

Hi Peng,

On Sat, Oct 06, 2018 at 06:22:11AM +0800, Peng Hao wrote:
> find_lock_lowest_rq may or not releease rq lock when return
> lowest_rq=NULL, but it is fuzzy.
> If not releasing rq lock, it is unnecessary to re-call
> pick_next_pushable_task.

IIRC, deadline.c uses a similar pattern (c.f., find_lock_later_rq() and
pick_next_pushable_dl_task()): should it be considered for this change?

Andrea


> When CONFIG_PREEMPT=n, not releasing rq lock and return
> lowest_rq=null frequently happens in a simple test case:
> Four different rt priority tasks run on limited two cpus.
> Thanks for Steven Rostedt's advice.
>
> Signed-off-by: Peng Hao <[email protected]>
> ---
> kernel/sched/rt.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 2e2955a..be0fc43 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1754,7 +1754,7 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
> !task_on_rq_queued(task))) {
>
> double_unlock_balance(rq, lowest_rq);
> - lowest_rq = NULL;
> + lowest_rq = RETRY_TASK;
> break;
> }
> }
> @@ -1830,7 +1830,9 @@ static int push_rt_task(struct rq *rq)
>
> /* find_lock_lowest_rq locks the rq if found */
> lowest_rq = find_lock_lowest_rq(next_task, rq);
> - if (!lowest_rq) {
> + if (!lowest_rq)
> + goto out;
> + if (lowest_rq == RETRY_TASK) {
> struct task_struct *task;
> /*
> * find_lock_lowest_rq releases rq->lock
> --
> 1.8.3.1
>

2018-10-05 16:10:39

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v3] sched/rt : return accurate release rq lock info

On Sat, 6 Oct 2018 00:03:58 +0800 (CST)
<[email protected]> wrote:

> >Hi Peng,
>
> >On Sat, Oct 06, 2018 at 06:22:11AM +0800, Peng Hao wrote:
> >> find_lock_lowest_rq may or not releease rq lock when return
> >> lowest_rq=NULL, but it is fuzzy.
> >> If not releasing rq lock, it is unnecessary to re-call
> >> pick_next_pushable_task.
>
> >IIRC, deadline.c uses a similar pattern (c.f., find_lock_later_rq() and
> >pick_next_pushable_dl_task()): should it be considered for this change?
> peterz asked the same question.
> at first I thought dl's retry action is simple. But now the change is simpler, I will
> add it.
>

I would still do that as a separate patch though.

-- Steve

2018-10-15 09:21:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3] sched/rt : return accurate release rq lock info

On Sat, Oct 06, 2018 at 06:22:11AM +0800, Peng Hao wrote:
> find_lock_lowest_rq may or not releease rq lock when return
> lowest_rq=NULL, but it is fuzzy.
> If not releasing rq lock, it is unnecessary to re-call
> pick_next_pushable_task.
> When CONFIG_PREEMPT=n, not releasing rq lock and return
> lowest_rq=null frequently happens in a simple test case:
> Four different rt priority tasks run on limited two cpus.
> Thanks for Steven Rostedt's advice.

Can we please write a more coherent Changelog, the above is very hard to
read.

Maybe something along the lines of:

Subject: sched/rt: Reduce push_rt_task() retries

Improve push_rt_task() by propagating the double_lock_balance() usage
from find_lock_lowest_rq(), thereby reducing the number of cases where
we have to assume rq->lock was dropped.


> Signed-off-by: Peng Hao <[email protected]>
> ---
> kernel/sched/rt.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 2e2955a..be0fc43 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1754,7 +1754,7 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
> !task_on_rq_queued(task))) {
>
> double_unlock_balance(rq, lowest_rq);
> - lowest_rq = NULL;
> + lowest_rq = RETRY_TASK;
> break;
> }
> }

I'm confused.. should not:

/* try again */
double_unlock_balance(rq, lowest_rq);
lowest_rq = NULL;

also return RETRY_TASK? That also is in the double_lock_balance() path
and will this have had rq->lock() released.

2018-10-15 15:43:07

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v3] sched/rt : return accurate release rq lock info

On Mon, 15 Oct 2018 11:20:32 +0200
Peter Zijlstra <[email protected]> wrote:

> > index 2e2955a..be0fc43 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -1754,7 +1754,7 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
> > !task_on_rq_queued(task))) {
> >
> > double_unlock_balance(rq, lowest_rq);
> > - lowest_rq = NULL;
> > + lowest_rq = RETRY_TASK;
> > break;
> > }
> > }
>
> I'm confused.. should not:
>
> /* try again */
> double_unlock_balance(rq, lowest_rq);
> lowest_rq = NULL;
>
> also return RETRY_TASK? That also is in the double_lock_balance() path
> and will this have had rq->lock() released.

I thought the same thing at first, but this is in the loop path, where
it does everything again. But now looking closer, I think there's a bug
in the original code.

We only do the check if the immediate double_lock_balance() released
the current task rq lock, but we don't take into account if it was
released earlier, which means it could have migrated and we never
noticed!

I believe the code should look like this:

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 2e2955a8cf8f..2c9128ce61e2 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1718,6 +1718,7 @@ static int find_lowest_rq(struct task_struct *task)
static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
{
struct rq *lowest_rq = NULL;
+ bool released = false;
int tries;
int cpu;

@@ -1740,7 +1741,7 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
}

/* if the prio of this runqueue changed, try again */
- if (double_lock_balance(rq, lowest_rq)) {
+ if (double_lock_balance(rq, lowest_rq) || released) {
/*
* We had to unlock the run queue. In
* the mean time, task could have
@@ -1754,7 +1755,7 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
!task_on_rq_queued(task))) {

double_unlock_balance(rq, lowest_rq);
- lowest_rq = NULL;
+ lowest_rq = RETRY_TASK;
break;
}
}
@@ -1764,10 +1765,15 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
break;

/* try again */
- double_unlock_balance(rq, lowest_rq);
+ if (double_unlock_balance(rq, lowest_rq))
+ released = true;
+
lowest_rq = NULL;
}

+ if (!lowest_rq && released)
+ lowest_rq = RETRY_TASK;
+
return lowest_rq;
}

-- Steve

2018-10-15 17:21:57

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v3] sched/rt : return accurate release rq lock info

On Tue, 16 Oct 2018 00:09:43 +0800 (CST)
<[email protected]> wrote:

> >We only do the check if the immediate double_lock_balance() released
> >the current task rq lock, but we don't take into account if it was
> >released earlier, which means it could have migrated and we never
> >noticed!
> >
> double_lock_balance may release current rq's lock,but it just for get the locks of the two rq's in order
> and it immediately reacquire the current rq's lock before double_lock_balance returns.
> >I believe the code should look like this:
> >

Bah, I didn't even compile it. And thought it was
"double_lock_balance", and didn't notice it was double_unlock_balance()
(this is what I get for trying to do too much at once).

Sad part is, I noticed this back when I added reviewed-by, but then
looking at it again, I did the same mistake :-/

Yeah, never mind, it's fine, my original reviewed-by stands.

-- Steve

2018-10-16 12:37:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3] sched/rt : return accurate release rq lock info

On Mon, Oct 15, 2018 at 11:42:20AM -0400, Steven Rostedt wrote:
> On Mon, 15 Oct 2018 11:20:32 +0200
> Peter Zijlstra <[email protected]> wrote:
>
> > > index 2e2955a..be0fc43 100644
> > > --- a/kernel/sched/rt.c
> > > +++ b/kernel/sched/rt.c
> > > @@ -1754,7 +1754,7 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
> > > !task_on_rq_queued(task))) {
> > >
> > > double_unlock_balance(rq, lowest_rq);
> > > - lowest_rq = NULL;
> > > + lowest_rq = RETRY_TASK;
> > > break;
> > > }
> > > }
> >
> > I'm confused.. should not:
> >
> > /* try again */
> > double_unlock_balance(rq, lowest_rq);
> > lowest_rq = NULL;
> >
> > also return RETRY_TASK? That also is in the double_lock_balance() path
> > and will this have had rq->lock() released.
>
> I thought the same thing at first, but this is in the loop path, where
> it does everything again. But now looking closer, I think there's a bug
> in the original code.

So I find that whole thing utterly confusing; what about we start with
something like so?

---
kernel/sched/rt.c | 51 +++++++++++++++++++++++----------------------------
1 file changed, 23 insertions(+), 28 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 2e2955a8cf8f..237c84c2b042 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1714,6 +1714,26 @@ static int find_lowest_rq(struct task_struct *task)
return -1;
}

+static struct task_struct *first_pushable_task(struct rq *rq)
+{
+ struct task_struct *p;
+
+ if (!has_pushable_tasks(rq))
+ return NULL;
+
+ p = plist_first_entry(&rq->rt.pushable_tasks,
+ struct task_struct, pushable_tasks);
+
+ BUG_ON(rq->cpu != task_cpu(p));
+ BUG_ON(task_current(rq, p));
+ BUG_ON(p->nr_cpus_allowed <= 1);
+
+ BUG_ON(!task_on_rq_queued(p));
+ BUG_ON(!rt_task(p));
+
+ return p;
+}
+
/* Will lock the rq it finds */
static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
{
@@ -1747,12 +1767,7 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
* migrated already or had its affinity changed.
* Also make sure that it wasn't scheduled on its rq.
*/
- if (unlikely(task_rq(task) != rq ||
- !cpumask_test_cpu(lowest_rq->cpu, &task->cpus_allowed) ||
- task_running(rq, task) ||
- !rt_task(task) ||
- !task_on_rq_queued(task))) {
-
+ if (first_pushable_task(rq) != task)
double_unlock_balance(rq, lowest_rq);
lowest_rq = NULL;
break;
@@ -1771,26 +1786,6 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
return lowest_rq;
}

-static struct task_struct *pick_next_pushable_task(struct rq *rq)
-{
- struct task_struct *p;
-
- if (!has_pushable_tasks(rq))
- return NULL;
-
- p = plist_first_entry(&rq->rt.pushable_tasks,
- struct task_struct, pushable_tasks);
-
- BUG_ON(rq->cpu != task_cpu(p));
- BUG_ON(task_current(rq, p));
- BUG_ON(p->nr_cpus_allowed <= 1);
-
- BUG_ON(!task_on_rq_queued(p));
- BUG_ON(!rt_task(p));
-
- return p;
-}
-
/*
* If the current CPU has more than one RT task, see if the non
* running task can migrate over to a CPU that is running a task
@@ -1805,7 +1800,7 @@ static int push_rt_task(struct rq *rq)
if (!rq->rt.overloaded)
return 0;

- next_task = pick_next_pushable_task(rq);
+ next_task = first_pushable_task(rq);
if (!next_task)
return 0;

@@ -1840,7 +1835,7 @@ static int push_rt_task(struct rq *rq)
* run-queue and is also still the next task eligible for
* pushing.
*/
- task = pick_next_pushable_task(rq);
+ task = first_pushable_task(rq);
if (task == next_task) {
/*
* The task hasn't migrated, and is still the next