Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932269AbcKPMNn (ORCPT ); Wed, 16 Nov 2016 07:13:43 -0500 Received: from terminus.zytor.com ([198.137.202.10]:35390 "EHLO terminus.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751802AbcKPMNl (ORCPT ); Wed, 16 Nov 2016 07:13:41 -0500 Date: Wed, 16 Nov 2016 04:13:06 -0800 From: tip-bot for Morten Rasmussen Message-ID: Cc: tglx@linutronix.de, hpa@zytor.com, torvalds@linux-foundation.org, morten.rasmussen@arm.com, linux-kernel@vger.kernel.org, mingo@kernel.org, peterz@infradead.org Reply-To: tglx@linutronix.de, peterz@infradead.org, mingo@kernel.org, linux-kernel@vger.kernel.org, hpa@zytor.com, morten.rasmussen@arm.com, torvalds@linux-foundation.org In-Reply-To: <1476452472-24740-3-git-send-email-morten.rasmussen@arm.com> References: <1476452472-24740-3-git-send-email-morten.rasmussen@arm.com> To: linux-tip-commits@vger.kernel.org Subject: [tip:sched/core] sched/fair: Consider spare capacity in find_idlest_group() Git-Commit-ID: 6a0b19c0f39a7a7b7fb77d3867a733136ff059a3 X-Mailer: tip-git-log-daemon Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4790 Lines: 138 Commit-ID: 6a0b19c0f39a7a7b7fb77d3867a733136ff059a3 Gitweb: http://git.kernel.org/tip/6a0b19c0f39a7a7b7fb77d3867a733136ff059a3 Author: Morten Rasmussen AuthorDate: Fri, 14 Oct 2016 14:41:08 +0100 Committer: Ingo Molnar CommitDate: Wed, 16 Nov 2016 10:29:05 +0100 sched/fair: Consider spare capacity in find_idlest_group() In low-utilization scenarios comparing relative loads in find_idlest_group() doesn't always lead to the most optimum choice. Systems with groups containing different numbers of cpus and/or cpus of different compute capacity are significantly better off when considering spare capacity rather than relative load in those scenarios. In addition to existing load based search an alternative spare capacity based candidate sched_group is found and selected instead if sufficient spare capacity exists. If not, existing behaviour is preserved. Signed-off-by: Morten Rasmussen Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: dietmar.eggemann@arm.com Cc: freedom.tan@mediatek.com Cc: keita.kobayashi.ym@renesas.com Cc: mgalbraith@suse.de Cc: sgurrappadi@nvidia.com Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1476452472-24740-3-git-send-email-morten.rasmussen@arm.com Signed-off-by: Ingo Molnar --- kernel/sched/fair.c | 50 +++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 45 insertions(+), 5 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index b05d691..1ad3706 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5202,6 +5202,14 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, return 1; } +static inline int task_util(struct task_struct *p); +static int cpu_util_wake(int cpu, struct task_struct *p); + +static unsigned long capacity_spare_wake(int cpu, struct task_struct *p) +{ + return capacity_orig_of(cpu) - cpu_util_wake(cpu, p); +} + /* * find_idlest_group finds and returns the least busy CPU group within the * domain. @@ -5211,7 +5219,9 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu, int sd_flag) { struct sched_group *idlest = NULL, *group = sd->groups; + struct sched_group *most_spare_sg = NULL; unsigned long min_load = ULONG_MAX, this_load = 0; + unsigned long most_spare = 0, this_spare = 0; int load_idx = sd->forkexec_idx; int imbalance = 100 + (sd->imbalance_pct-100)/2; @@ -5219,7 +5229,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, load_idx = sd->wake_idx; do { - unsigned long load, avg_load; + unsigned long load, avg_load, spare_cap, max_spare_cap; int local_group; int i; @@ -5231,8 +5241,12 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, local_group = cpumask_test_cpu(this_cpu, sched_group_cpus(group)); - /* Tally up the load of all CPUs in the group */ + /* + * Tally up the load of all CPUs in the group and find + * the group containing the CPU with most spare capacity. + */ avg_load = 0; + max_spare_cap = 0; for_each_cpu(i, sched_group_cpus(group)) { /* Bias balancing toward cpus of our domain */ @@ -5242,6 +5256,11 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, load = target_load(i, load_idx); avg_load += load; + + spare_cap = capacity_spare_wake(i, p); + + if (spare_cap > max_spare_cap) + max_spare_cap = spare_cap; } /* Adjust by relative CPU capacity of the group */ @@ -5249,12 +5268,33 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, if (local_group) { this_load = avg_load; - } else if (avg_load < min_load) { - min_load = avg_load; - idlest = group; + this_spare = max_spare_cap; + } else { + if (avg_load < min_load) { + min_load = avg_load; + idlest = group; + } + + if (most_spare < max_spare_cap) { + most_spare = max_spare_cap; + most_spare_sg = group; + } } } while (group = group->next, group != sd->groups); + /* + * The cross-over point between using spare capacity or least load + * is too conservative for high utilization tasks on partially + * utilized systems if we require spare_capacity > task_util(p), + * so we allow for some task stuffing by using + * spare_capacity > task_util(p)/2. + */ + if (this_spare > task_util(p) / 2 && + imbalance*this_spare > 100*most_spare) + return NULL; + else if (most_spare > task_util(p) / 2) + return most_spare_sg; + if (!idlest || 100*this_load < imbalance*min_load) return NULL; return idlest;