2017-08-18 08:22:07

by Byungchul Park

[permalink] [raw]
Subject: [PATCH v8 0/2] Make find_later_rq() choose a closer cpu in topology

Thanks to Joel, I could fix a typo and simplify code more.

-----8<-----

When cpudl_find() returns any among free_cpus, the cpu might not be
closer than others, considering sched domain. For example:

this_cpu: 15
free_cpus: 0, 1,..., 14 (== later_mask)
best_cpu: 0

topology:

0 --+
+--+
1 --+ |
+-- ... --+
2 --+ | |
+--+ |
3 --+ |

... ...

12 --+ |
+--+ |
13 --+ | |
+-- ... -+
14 --+ |
+--+
15 --+

In this case, it would be best to select 14 since it's a free cpu and
closest to 15(this_cpu). However, currently the code select 0(best_cpu)
even though that's just any among free_cpus. Fix it.

Change from v7
-. fix a trivial typo
-. modify commit messages to explain what it does more clearly
-. simplify code with an existing macro

Change from v6
-. add a comment about selection of fallback_cpu incase more than one exist
-. modify a comment explaining what we do wrt PREFER_SIBLING

Change from v5
-. exclude two patches already picked up by peterz
(sched/deadline: Make find_later_rq() choose a closer cpu in topology)
(sched/deadline: Change return value of cpudl_find())
-. apply what peterz fixed for 'prefer sibling', into deadline and rt

Change from v4
-. remove a patch that might cause huge lock contention
(by spin lock(&cpudl.lock) in a hot path of scheduler)

Change from v3
-. rename closest_cpu to best_cpu so that it align with rt
-. protect referring cpudl.elements with cpudl.lock
-. change return value of cpudl_find() to bool

Change from v2
-. add support for SD_PREFER_SIBLING

Change from v1
-. clean up the patch

Byungchul Park (2):
sched/deadline: Add support for SD_PREFER_SIBLING on find_later_rq()
sched/rt: Add support for SD_PREFER_SIBLING on find_lowest_rq()

kernel/sched/deadline.c | 55 +++++++++++++++++++++++++++++++++++++++++++++---
kernel/sched/rt.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++---
2 files changed, 105 insertions(+), 6 deletions(-)

--
1.9.1


2017-08-18 08:22:09

by Byungchul Park

[permalink] [raw]
Subject: [PATCH v8 1/2] sched/deadline: Add support for SD_PREFER_SIBLING on find_later_rq()

It would be better to try to check other siblings first if
SD_PREFER_SIBLING is flaged when pushing tasks - migration.

Signed-off-by: Byungchul Park <[email protected]>
---
kernel/sched/deadline.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 52 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 0223694..115250b 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1319,12 +1319,35 @@ static struct task_struct *pick_earliest_pushable_dl_task(struct rq *rq, int cpu

static DEFINE_PER_CPU(cpumask_var_t, local_cpu_mask_dl);

+/*
+ * Find the first cpu in: mask & sd & ~prefer
+ */
+static int find_cpu(const struct cpumask *mask,
+ const struct sched_domain *sd,
+ const struct sched_domain *prefer)
+{
+ const struct cpumask *sds = sched_domain_span(sd);
+ const struct cpumask *ps = prefer ? sched_domain_span(prefer) : NULL;
+ int cpu;
+
+ for_each_cpu(cpu, mask) {
+ if (!cpumask_test_cpu(cpu, sds))
+ continue;
+ if (ps && cpumask_test_cpu(cpu, ps))
+ continue;
+ break;
+ }
+
+ return cpu;
+}
+
static int find_later_rq(struct task_struct *task)
{
- struct sched_domain *sd;
+ struct sched_domain *sd, *prefer = NULL;
struct cpumask *later_mask = this_cpu_cpumask_var_ptr(local_cpu_mask_dl);
int this_cpu = smp_processor_id();
int cpu = task_cpu(task);
+ int fallback_cpu = -1;

/* Make sure the mask is initialized first */
if (unlikely(!later_mask))
@@ -1376,8 +1399,7 @@ static int find_later_rq(struct task_struct *task)
return this_cpu;
}

- best_cpu = cpumask_first_and(later_mask,
- sched_domain_span(sd));
+ best_cpu = find_cpu(later_mask, sd, prefer);
/*
* Last chance: if a cpu being in both later_mask
* and current sd span is valid, that becomes our
@@ -1385,6 +1407,26 @@ static int find_later_rq(struct task_struct *task)
* already under consideration through later_mask.
*/
if (best_cpu < nr_cpu_ids) {
+ /*
+ * If current domain is SD_PREFER_SIBLING
+ * flaged, we have to try to check other
+ * siblings first.
+ */
+ if (sd->flags & SD_PREFER_SIBLING) {
+ prefer = sd;
+
+ /*
+ * fallback_cpu should be one
+ * in the closest domain among
+ * SD_PREFER_SIBLING domains,
+ * in case that more than one
+ * SD_PREFER_SIBLING domains
+ * exist in the hierachy.
+ */
+ if (fallback_cpu == -1)
+ fallback_cpu = best_cpu;
+ continue;
+ }
rcu_read_unlock();
return best_cpu;
}
@@ -1393,6 +1435,13 @@ static int find_later_rq(struct task_struct *task)
rcu_read_unlock();

/*
+ * If fallback_cpu is valid, all our guesses failed *except* for
+ * SD_PREFER_SIBLING domain. Now, we can return the fallback cpu.
+ */
+ if (fallback_cpu != -1)
+ return fallback_cpu;
+
+ /*
* At this point, all our guesses failed, we just return
* 'something', and let the caller sort the things out.
*/
--
1.9.1

2017-08-18 08:22:08

by Byungchul Park

[permalink] [raw]
Subject: [PATCH v8 2/2] sched/rt: Add support for SD_PREFER_SIBLING on find_lowest_rq()

It would be better to try to check other siblings first if
SD_PREFER_SIBLING is flaged when pushing tasks - migration.

Signed-off-by: Byungchul Park <[email protected]>
---
kernel/sched/rt.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 53 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 979b734..b6d8ba9 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1618,12 +1618,35 @@ static struct task_struct *pick_highest_pushable_task(struct rq *rq, int cpu)

static DEFINE_PER_CPU(cpumask_var_t, local_cpu_mask);

+/*
+ * Find the first cpu in: mask & sd & ~prefer
+ */
+static int find_cpu(const struct cpumask *mask,
+ const struct sched_domain *sd,
+ const struct sched_domain *prefer)
+{
+ const struct cpumask *sds = sched_domain_span(sd);
+ const struct cpumask *ps = prefer ? sched_domain_span(prefer) : NULL;
+ int cpu;
+
+ for_each_cpu(cpu, mask) {
+ if (!cpumask_test_cpu(cpu, sds))
+ continue;
+ if (ps && cpumask_test_cpu(cpu, ps))
+ continue;
+ break;
+ }
+
+ return cpu;
+}
+
static int find_lowest_rq(struct task_struct *task)
{
- struct sched_domain *sd;
+ struct sched_domain *sd, *prefer = NULL;
struct cpumask *lowest_mask = this_cpu_cpumask_var_ptr(local_cpu_mask);
int this_cpu = smp_processor_id();
int cpu = task_cpu(task);
+ int fallback_cpu = -1;

/* Make sure the mask is initialized first */
if (unlikely(!lowest_mask))
@@ -1668,9 +1691,29 @@ static int find_lowest_rq(struct task_struct *task)
return this_cpu;
}

- best_cpu = cpumask_first_and(lowest_mask,
- sched_domain_span(sd));
+ best_cpu = find_cpu(lowest_mask, sd, prefer);
+
if (best_cpu < nr_cpu_ids) {
+ /*
+ * If current domain is SD_PREFER_SIBLING
+ * flaged, we have to try to check other
+ * siblings first.
+ */
+ if (sd->flags & SD_PREFER_SIBLING) {
+ prefer = sd;
+
+ /*
+ * fallback_cpu should be one
+ * in the closest domain among
+ * SD_PREFER_SIBLING domains,
+ * in case that more than one
+ * SD_PREFER_SIBLING domains
+ * exist in the hierachy.
+ */
+ if (fallback_cpu == -1)
+ fallback_cpu = best_cpu;
+ continue;
+ }
rcu_read_unlock();
return best_cpu;
}
@@ -1679,6 +1722,13 @@ static int find_lowest_rq(struct task_struct *task)
rcu_read_unlock();

/*
+ * If fallback_cpu is valid, all our guesses failed *except* for
+ * SD_PREFER_SIBLING domain. Now, we can return the fallback cpu.
+ */
+ if (fallback_cpu != -1)
+ return fallback_cpu;
+
+ /*
* And finally, if there were no matches within the domains
* just give the caller *something* to work with from the compatible
* locations.
--
1.9.1

2017-08-18 13:39:02

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v8 2/2] sched/rt: Add support for SD_PREFER_SIBLING on find_lowest_rq()

On Fri, 18 Aug 2017 17:21:59 +0900
Byungchul Park <[email protected]> wrote:

> It would be better to try to check other siblings first if
> SD_PREFER_SIBLING is flaged when pushing tasks - migration.
>
> Signed-off-by: Byungchul Park <[email protected]>

Looks good.

Reviewed-by: Steven Rostedt (VMware) <[email protected]>

-- Steve

2017-08-21 13:45:03

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH v8 1/2] sched/deadline: Add support for SD_PREFER_SIBLING on find_later_rq()

Hi,

On 18/08/17 17:21, Byungchul Park wrote:
> It would be better to try to check other siblings first if
> SD_PREFER_SIBLING is flaged when pushing tasks - migration.
>
> Signed-off-by: Byungchul Park <[email protected]>

Mmm, this looks like Peter's proposed patch, maybe add (at least) a
Suggested-by: him ?

https://marc.info/?l=linux-kernel&m=150176183807073

Also, I'm not sure what Peter meant with

"But still this isn't quite right, because when we consider this for SMT
(as was the intent here) we'll happily occupy a full sibling core over
finding an empty one."

since we are still using the later_mask, which should not include full
cores (unless it is the one with the lates deadline)?

> ---
> kernel/sched/deadline.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 52 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 0223694..115250b 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1319,12 +1319,35 @@ static struct task_struct *pick_earliest_pushable_dl_task(struct rq *rq, int cpu
>
> static DEFINE_PER_CPU(cpumask_var_t, local_cpu_mask_dl);
>
> +/*
> + * Find the first cpu in: mask & sd & ~prefer
> + */
> +static int find_cpu(const struct cpumask *mask,
> + const struct sched_domain *sd,
> + const struct sched_domain *prefer)
> +{
> + const struct cpumask *sds = sched_domain_span(sd);
> + const struct cpumask *ps = prefer ? sched_domain_span(prefer) : NULL;
> + int cpu;
> +
> + for_each_cpu(cpu, mask) {
> + if (!cpumask_test_cpu(cpu, sds))
> + continue;
> + if (ps && cpumask_test_cpu(cpu, ps))
> + continue;
> + break;
> + }
> +
> + return cpu;
> +}
> +
> static int find_later_rq(struct task_struct *task)
> {
> - struct sched_domain *sd;
> + struct sched_domain *sd, *prefer = NULL;
> struct cpumask *later_mask = this_cpu_cpumask_var_ptr(local_cpu_mask_dl);
> int this_cpu = smp_processor_id();
> int cpu = task_cpu(task);
> + int fallback_cpu = -1;
>
> /* Make sure the mask is initialized first */
> if (unlikely(!later_mask))
> @@ -1376,8 +1399,7 @@ static int find_later_rq(struct task_struct *task)
> return this_cpu;
> }
>
> - best_cpu = cpumask_first_and(later_mask,
> - sched_domain_span(sd));
> + best_cpu = find_cpu(later_mask, sd, prefer);
> /*
> * Last chance: if a cpu being in both later_mask
> * and current sd span is valid, that becomes our
> @@ -1385,6 +1407,26 @@ static int find_later_rq(struct task_struct *task)
> * already under consideration through later_mask.
> */

It seems that the comment above should be updated as well.

Thanks,

- Juri

2017-08-21 13:56:59

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v8 1/2] sched/deadline: Add support for SD_PREFER_SIBLING on find_later_rq()

On Mon, Aug 21, 2017 at 02:44:58PM +0100, Juri Lelli wrote:

> Also, I'm not sure what Peter meant with
>
> "But still this isn't quite right, because when we consider this for SMT
> (as was the intent here) we'll happily occupy a full sibling core over
> finding an empty one."

Consider a 4 core, SMT2 system:

LLC [0 - 7]

SMT [0,1] [2,3] [4,5] [6,7]

If we do a wake-up on CPU0, we'll find CPU1, mark that as fallback,
continue up the domain tree, exclude 0,1 from 0-7 and find CPU2.

A next wakeup on CPU0 does the same and will find CPU3, fully loading
that core, instead of considering CPU4 first.

Doing this 'right' is difficult and expensive :-/

2017-08-21 14:08:05

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH v8 1/2] sched/deadline: Add support for SD_PREFER_SIBLING on find_later_rq()

On 21/08/17 15:56, Peter Zijlstra wrote:
> On Mon, Aug 21, 2017 at 02:44:58PM +0100, Juri Lelli wrote:
>
> > Also, I'm not sure what Peter meant with
> >
> > "But still this isn't quite right, because when we consider this for SMT
> > (as was the intent here) we'll happily occupy a full sibling core over
> > finding an empty one."
>
> Consider a 4 core, SMT2 system:
>
> LLC [0 - 7]
>
> SMT [0,1] [2,3] [4,5] [6,7]
>
> If we do a wake-up on CPU0, we'll find CPU1, mark that as fallback,
> continue up the domain tree, exclude 0,1 from 0-7 and find CPU2.
>
> A next wakeup on CPU0 does the same and will find CPU3, fully loading
> that core, instead of considering CPU4 first.
>

Ah, right, I see. Thanks for explaining.

Byungchul, maybe you could add this explanation as a comment?

> Doing this 'right' is difficult and expensive :-/
>

2017-08-22 05:53:37

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 1/2] sched/deadline: Add support for SD_PREFER_SIBLING on find_later_rq()

On Mon, Aug 21, 2017 at 02:44:58PM +0100, Juri Lelli wrote:
> Hi,
> On 18/08/17 17:21, Byungchul Park wrote:
> > It would be better to try to check other siblings first if
> > SD_PREFER_SIBLING is flaged when pushing tasks - migration.
> >
> > Signed-off-by: Byungchul Park <[email protected]>
>
> Mmm, this looks like Peter's proposed patch, maybe add (at least) a
> Suggested-by: him ?

Hi Juri,

Why not. I will add it from the next spin.

BTW, is it enough? I don't know the way I should do, whenever I got
thankful suggestions. I really want to add them as a separate patch
which can be stacked on my patches _if possible_. But in case that
it's better to merge them into one like this, I don't know how.

I mean I will add 'Suggested-by' from now on - I learned what I should
do (at least) in this case thanks to Juri, but I'm still not sure if
it's enough.

Speaking of which, I have something to ask Peterz and Ingo for. I really
want to interact with maintainers actively e.g. asking ways they prefer.
But it takes too much long to get responses from them e.g. at most 2
monthes in case rushing them. I should have decided and done what the
best I think is, than asking.

It would be very appriciated if you pay more attention.

> > @@ -1376,8 +1399,7 @@ static int find_later_rq(struct task_struct *task)
> > return this_cpu;
> > }
> >
> > - best_cpu = cpumask_first_and(later_mask,
> > - sched_domain_span(sd));
> > + best_cpu = find_cpu(later_mask, sd, prefer);
> > /*
> > * Last chance: if a cpu being in both later_mask
> > * and current sd span is valid, that becomes our
> > @@ -1385,6 +1407,26 @@ static int find_later_rq(struct task_struct *task)
> > * already under consideration through later_mask.
> > */
>
> It seems that the comment above should be updated as well.

How? Could you explain it more?

Thanks,
Byungchul

2017-08-22 05:55:39

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 1/2] sched/deadline: Add support for SD_PREFER_SIBLING on find_later_rq()

On Mon, Aug 21, 2017 at 03:07:57PM +0100, Juri Lelli wrote:
> > Consider a 4 core, SMT2 system:
> >
> > LLC [0 - 7]
> >
> > SMT [0,1] [2,3] [4,5] [6,7]
> >
> > If we do a wake-up on CPU0, we'll find CPU1, mark that as fallback,
> > continue up the domain tree, exclude 0,1 from 0-7 and find CPU2.
> >
> > A next wakeup on CPU0 does the same and will find CPU3, fully loading
> > that core, instead of considering CPU4 first.
> >
>
> Ah, right, I see. Thanks for explaining.
>
> Byungchul, maybe you could add this explanation as a comment?

Yes. Good idea. I will add it.

Thank you,
Byungchul

2017-08-22 07:12:42

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 1/2] sched/deadline: Add support for SD_PREFER_SIBLING on find_later_rq()

On Tue, Aug 22, 2017 at 02:53:25PM +0900, Byungchul Park wrote:
> On Mon, Aug 21, 2017 at 02:44:58PM +0100, Juri Lelli wrote:
> > Hi,
> > On 18/08/17 17:21, Byungchul Park wrote:
> > > It would be better to try to check other siblings first if
> > > SD_PREFER_SIBLING is flaged when pushing tasks - migration.
> > >
> > > Signed-off-by: Byungchul Park <[email protected]>
> >
> > Mmm, this looks like Peter's proposed patch, maybe add (at least) a
> > Suggested-by: him ?
>
> Hi Juri,
>
> Why not. I will add it from the next spin.
>
> BTW, is it enough? I don't know the way I should do, whenever I got
> thankful suggestions. I really want to add them as a separate patch
> which can be stacked on my patches _if possible_. But in case that
> it's better to merge them into one like this, I don't know how.
>
> I mean I will add 'Suggested-by' from now on - I learned what I should
> do (at least) in this case thanks to Juri, but I'm still not sure if
> it's enough.
>
> Speaking of which, I have something to ask Peterz and Ingo for. I really
> want to interact with maintainers actively e.g. asking ways they prefer.
> But it takes too much long to get responses from them e.g. at most 2
> monthes in case rushing them. I should have decided and done what the
> best I think is, than asking.
>
> It would be very appriciated if you pay more attention.
>
> > > @@ -1376,8 +1399,7 @@ static int find_later_rq(struct task_struct *task)
> > > return this_cpu;
> > > }
> > >
> > > - best_cpu = cpumask_first_and(later_mask,
> > > - sched_domain_span(sd));
> > > + best_cpu = find_cpu(later_mask, sd, prefer);
> > > /*
> > > * Last chance: if a cpu being in both later_mask
> > > * and current sd span is valid, that becomes our
> > > @@ -1385,6 +1407,26 @@ static int find_later_rq(struct task_struct *task)
> > > * already under consideration through later_mask.
> > > */
> >
> > It seems that the comment above should be updated as well.
>
> How? Could you explain it more?

Let me try it by myself.. Please fix me at the next spin if needed.

Thank you,
Byungchul

2017-08-22 07:42:21

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH v8 1/2] sched/deadline: Add support for SD_PREFER_SIBLING on find_later_rq()

Hi,

On 22/08/17 14:53, Byungchul Park wrote:
> On Mon, Aug 21, 2017 at 02:44:58PM +0100, Juri Lelli wrote:
> > Hi,
> > On 18/08/17 17:21, Byungchul Park wrote:
> > > It would be better to try to check other siblings first if
> > > SD_PREFER_SIBLING is flaged when pushing tasks - migration.
> > >
> > > Signed-off-by: Byungchul Park <[email protected]>
> >
> > Mmm, this looks like Peter's proposed patch, maybe add (at least) a
> > Suggested-by: him ?
>
> Hi Juri,
>
> Why not. I will add it from the next spin.
>
> BTW, is it enough? I don't know the way I should do, whenever I got
> thankful suggestions. I really want to add them as a separate patch
> which can be stacked on my patches _if possible_. But in case that
> it's better to merge them into one like this, I don't know how.
>

It usually depends on the entity of the suggestion (is it expressed with
a sentence, actual code or a proper patch?) and what the person
suggesting it is fine with. I usually simply ask. :)

> I mean I will add 'Suggested-by' from now on - I learned what I should
> do (at least) in this case thanks to Juri, but I'm still not sure if
> it's enough.
>
> Speaking of which, I have something to ask Peterz and Ingo for. I really
> want to interact with maintainers actively e.g. asking ways they prefer.
> But it takes too much long to get responses from them e.g. at most 2
> monthes in case rushing them. I should have decided and done what the
> best I think is, than asking.
>
> It would be very appriciated if you pay more attention.
>

I try to be as responsive as I can (I think this applies to everyone)
and I apologize if from time to time it takes too much to reply. Balance
between upstream and product work means that there are times when one of
the two gets a bit delayed, I'm afraid.

Please keep asking questions, propose solutions and chase people! :)

> > > @@ -1376,8 +1399,7 @@ static int find_later_rq(struct task_struct *task)
> > > return this_cpu;
> > > }
> > >
> > > - best_cpu = cpumask_first_and(later_mask,
> > > - sched_domain_span(sd));
> > > + best_cpu = find_cpu(later_mask, sd, prefer);
> > > /*
> > > * Last chance: if a cpu being in both later_mask
> > > * and current sd span is valid, that becomes our
> > > @@ -1385,6 +1407,26 @@ static int find_later_rq(struct task_struct *task)
> > > * already under consideration through later_mask.
> > > */
> >
> > It seems that the comment above should be updated as well.
>
> How? Could you explain it more?
>

Simply removing it might just work.

Thanks,

- Juri