This patchset optimises away an unused comparison, and fixes some corner cases in
the find_idlest_group path of select_task_rq_fair.
Changes v2 -> v3:
- 1/5 through 4/5 unchanged.
- 5/5: Essentially undid [1]. Vincent pointed out that this isn't really a
bugfix but more of a change in policy. Now find_idlest_group still returns
NULL when the local group is idlest.
I previously said some extra code could be removed in this case, but after
more careful inspection I changed my mind.
Changes v1 -> v2:
- Reworked task affinity checks to avoid repeating them, as per Vincent's
suggestion. To avoid excessive indentation this required moving code into its
own function, as per PeterZ's suggestion.
- Split up the patches.
[1]
- Altered the caller of find_idlest_group so that it now unconditionally uses
find_idlest_group_cpu (formerly find_idlest_cpu). This means that we more
often use the maligned "perspective-switching" logic at the bottom of the
while(sd) loop, but it also means the task placement algorithm is more
consistent between whether the idlest group is local or remote.
As mentioned in 5/5 an alternative would be to just initialise @new_cpu to
@cpu instead of @prev_cpu (which is what PeterZ suggested in v1 review). In
that case, some extra code could be removed in & around
find_idlest_group_cpu.
Brendan Jackman (5):
sched/fair: Move select_task_rq_fair slow-path into its own function
sched/fair: Remove unnecessary comparison with -1
sched/fair: Fix find_idlest_group when local group is not allowed
sched/fair: Fix use of find_idlest_group when no groups are allowed
sched/fair: Fix use of find_idlest_group when local group is idlest
kernel/sched/fair.c | 93 ++++++++++++++++++++++++++++++++---------------------
1 file changed, 56 insertions(+), 37 deletions(-)
--
2.7.4
Since commit 83a0a96a5f26 ("sched/fair: Leverage the idle state info
when choosing the "idlest" cpu") find_idlest_grou_cpu (formerly
find_idlest_cpu) no longer returns -1.
Signed-off-by: Brendan Jackman <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Morten Rasmussen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f6e277c6..4ccecbf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5583,7 +5583,7 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
}
new_cpu = find_idlest_group_cpu(group, p, cpu);
- if (new_cpu == -1 || new_cpu == cpu) {
+ if (new_cpu == cpu) {
/* Now try balancing at a lower domain level of cpu */
sd = sd->child;
continue;
--
2.7.4
When the local group is not allowed we do not modify this_*_load from
their initial value of 0. That means that the load checks at the end
of find_idlest_group cause us to incorrectly return NULL. Fixing the
initial values to ULONG_MAX means we will instead return the idlest
remote group in that case.
Signed-off-by: Brendan Jackman <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Morten Rasmussen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
---
kernel/sched/fair.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4ccecbf..0ce75bb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5387,8 +5387,9 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p,
{
struct sched_group *idlest = NULL, *group = sd->groups;
struct sched_group *most_spare_sg = NULL;
- unsigned long min_runnable_load = ULONG_MAX, this_runnable_load = 0;
- unsigned long min_avg_load = ULONG_MAX, this_avg_load = 0;
+ unsigned long min_runnable_load = ULONG_MAX;
+ unsigned long this_runnable_load = ULONG_MAX;
+ unsigned long min_avg_load = ULONG_MAX, this_avg_load = ULONG_MAX;
unsigned long most_spare = 0, this_spare = 0;
int load_idx = sd->forkexec_idx;
int imbalance_scale = 100 + (sd->imbalance_pct-100)/2;
--
2.7.4
find_idlest_group returns NULL when the local group is idlest. The
caller then continues the find_idlest_group search at a lower level
of the current CPU's sched_domain hierarchy. find_idlest_group_cpu is
not consulted and, crucially, @new_cpu is not updated. This means the
search is pointless and we return @prev_cpu from select_task_rq_fair.
This is fixed by initialising @new_cpu to @cpu instead of
@prev_cpu.
Signed-off-by: Brendan Jackman <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Morten Rasmussen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2608091..f93cb97 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5567,7 +5567,7 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p,
int cpu, int prev_cpu, int sd_flag)
{
- int new_cpu = prev_cpu;
+ int new_cpu = cpu;
if (!cpumask_intersects(sched_domain_span(sd), &p->cpus_allowed))
return prev_cpu;
--
2.7.4
When p is allowed on none of the CPUs in the sched_domain, we
currently return NULL from find_idlest_group, and pointlessly
continue the search on lower sched_domain levels (where p is also not
allowed) before returning prev_cpu regardless (as we have not updated
new_cpu).
Add an explicit check for this case, and a comment to
find_idlest_group. Now when find_idlest_group returns NULL, it always
means that the local group is allowed and idlest.
Signed-off-by: Brendan Jackman <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Morten Rasmussen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
---
kernel/sched/fair.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0ce75bb..2608091 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5380,6 +5380,8 @@ static unsigned long capacity_spare_wake(int cpu, struct task_struct *p)
/*
* find_idlest_group finds and returns the least busy CPU group within the
* domain.
+ *
+ * Assumes p is allowed on at least one CPU in sd.
*/
static struct sched_group *
find_idlest_group(struct sched_domain *sd, struct task_struct *p,
@@ -5567,6 +5569,9 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
{
int new_cpu = prev_cpu;
+ if (!cpumask_intersects(sched_domain_span(sd), &p->cpus_allowed))
+ return prev_cpu;
+
while (sd) {
struct sched_group *group;
struct sched_domain *tmp;
--
2.7.4
In preparation for changes that would otherwise require adding a new
level of indentation to the while(sd) loop, create a new function
find_idlest_cpu which contains this loop, and rename the existing
find_idlest_cpu to find_idlest_group_cpu.
Code inside the while(sd) loop is unchanged. @new_cpu is added as a
variable in the new function, with the same initial value as the
@new_cpu in select_task_rq_fair.
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Brendan Jackman <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Morten Rasmussen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
---
kernel/sched/fair.c | 83 +++++++++++++++++++++++++++++++----------------------
1 file changed, 48 insertions(+), 35 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c95880e..f6e277c6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5509,10 +5509,10 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p,
}
/*
- * find_idlest_cpu - find the idlest cpu among the cpus in group.
+ * find_idlest_group_cpu - find the idlest cpu among the cpus in group.
*/
static int
-find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
+find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
{
unsigned long load, min_load = ULONG_MAX;
unsigned int min_exit_latency = UINT_MAX;
@@ -5561,6 +5561,50 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : least_loaded_cpu;
}
+static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p,
+ int cpu, int prev_cpu, int sd_flag)
+{
+ int new_cpu = prev_cpu;
+
+ while (sd) {
+ struct sched_group *group;
+ struct sched_domain *tmp;
+ int weight;
+
+ if (!(sd->flags & sd_flag)) {
+ sd = sd->child;
+ continue;
+ }
+
+ group = find_idlest_group(sd, p, cpu, sd_flag);
+ if (!group) {
+ sd = sd->child;
+ continue;
+ }
+
+ new_cpu = find_idlest_group_cpu(group, p, cpu);
+ if (new_cpu == -1 || new_cpu == cpu) {
+ /* Now try balancing at a lower domain level of cpu */
+ sd = sd->child;
+ continue;
+ }
+
+ /* Now try balancing at a lower domain level of new_cpu */
+ cpu = new_cpu;
+ weight = sd->span_weight;
+ sd = NULL;
+ for_each_domain(cpu, tmp) {
+ if (weight <= tmp->span_weight)
+ break;
+ if (tmp->flags & sd_flag)
+ sd = tmp;
+ }
+ /* while loop will break here if sd == NULL */
+ }
+
+ return new_cpu;
+}
+
#ifdef CONFIG_SCHED_SMT
static inline void set_idle_cores(int cpu, int val)
@@ -5918,39 +5962,8 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
if (sd_flag & SD_BALANCE_WAKE) /* XXX always ? */
new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
- } else while (sd) {
- struct sched_group *group;
- int weight;
-
- if (!(sd->flags & sd_flag)) {
- sd = sd->child;
- continue;
- }
-
- group = find_idlest_group(sd, p, cpu, sd_flag);
- if (!group) {
- sd = sd->child;
- continue;
- }
-
- new_cpu = find_idlest_cpu(group, p, cpu);
- if (new_cpu == -1 || new_cpu == cpu) {
- /* Now try balancing at a lower domain level of cpu */
- sd = sd->child;
- continue;
- }
-
- /* Now try balancing at a lower domain level of new_cpu */
- cpu = new_cpu;
- weight = sd->span_weight;
- sd = NULL;
- for_each_domain(cpu, tmp) {
- if (weight <= tmp->span_weight)
- break;
- if (tmp->flags & sd_flag)
- sd = tmp;
- }
- /* while loop will break here if sd == NULL */
+ } else {
+ new_cpu = find_idlest_cpu(sd, p, cpu, prev_cpu, sd_flag);
}
rcu_read_unlock();
--
2.7.4
On Thu, Aug 31, 2017 at 12:58:00PM +0100, Brendan Jackman wrote:
> When the local group is not allowed we do not modify this_*_load from
> their initial value of 0. That means that the load checks at the end
> of find_idlest_group cause us to incorrectly return NULL. Fixing the
> initial values to ULONG_MAX means we will instead return the idlest
> remote group in that case.
>
Reviewed-by: Josef Bacik <[email protected]>
Thanks,
Josef
On Thu, Aug 31, 2017 at 12:58:01PM +0100, Brendan Jackman wrote:
> When p is allowed on none of the CPUs in the sched_domain, we
> currently return NULL from find_idlest_group, and pointlessly
> continue the search on lower sched_domain levels (where p is also not
> allowed) before returning prev_cpu regardless (as we have not updated
> new_cpu).
>
> Add an explicit check for this case, and a comment to
> find_idlest_group. Now when find_idlest_group returns NULL, it always
> means that the local group is allowed and idlest.
>
Reviewed-by: Josef Bacik <[email protected]>
Thanks,
Josef
On Thu, Aug 31, 2017 at 12:58:02PM +0100, Brendan Jackman wrote:
> find_idlest_group returns NULL when the local group is idlest. The
> caller then continues the find_idlest_group search at a lower level
> of the current CPU's sched_domain hierarchy. find_idlest_group_cpu is
> not consulted and, crucially, @new_cpu is not updated. This means the
> search is pointless and we return @prev_cpu from select_task_rq_fair.
>
> This is fixed by initialising @new_cpu to @cpu instead of
> @prev_cpu.
>
Reviewed-by: Josef Bacik <[email protected]>
Thanks,
Josef
On 31 August 2017 at 13:58, Brendan Jackman <[email protected]> wrote:
> find_idlest_group returns NULL when the local group is idlest. The
> caller then continues the find_idlest_group search at a lower level
> of the current CPU's sched_domain hierarchy. find_idlest_group_cpu is
> not consulted and, crucially, @new_cpu is not updated. This means the
> search is pointless and we return @prev_cpu from select_task_rq_fair.
>
> This is fixed by initialising @new_cpu to @cpu instead of
> @prev_cpu.
>
> Signed-off-by: Brendan Jackman <[email protected]>
> Cc: Dietmar Eggemann <[email protected]>
> Cc: Vincent Guittot <[email protected]>
> Cc: Josef Bacik <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Morten Rasmussen <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
> ---
> kernel/sched/fair.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 2608091..f93cb97 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5567,7 +5567,7 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
> static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p,
> int cpu, int prev_cpu, int sd_flag)
> {
> - int new_cpu = prev_cpu;
> + int new_cpu = cpu;
>
> if (!cpumask_intersects(sched_domain_span(sd), &p->cpus_allowed))
> return prev_cpu;
> --
> 2.7.4
>