Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932949AbcCIO7V (ORCPT ); Wed, 9 Mar 2016 09:59:21 -0500 Received: from mail-wm0-f49.google.com ([74.125.82.49]:34337 "EHLO mail-wm0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753684AbcCIO7N (ORCPT ); Wed, 9 Mar 2016 09:59:13 -0500 From: Matt Fleming To: Peter Zijlstra , Ingo Molnar Cc: linux-kernel@vger.kernel.org, Matt Fleming , Mike Galbraith , Mel Gorman Subject: [PATCH] sched/fair: Add comments to explain select_idle_sibling() Date: Wed, 9 Mar 2016 14:59:08 +0000 Message-Id: <1457535548-15329-1-git-send-email-matt@codeblueprint.co.uk> X-Mailer: git-send-email 2.6.2 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2042 Lines: 58 It's not entirely obvious how the main loop in select_idle_sibling() works on first glance. Sprinkle a few comments to explain the design and intention behind the loop based on some conversations with Mike and Peter. Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Mel Gorman Signed-off-by: Matt Fleming --- kernel/sched/fair.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1926606ece80..4710e4a2722d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4971,7 +4971,19 @@ static int select_idle_sibling(struct task_struct *p, int target) return i; /* - * Otherwise, iterate the domains and find an elegible idle cpu. + * Otherwise, iterate the domains and find an eligible idle cpu. + * + * A completely idle sched group at higher domains is more + * desirable than an idle group at a lower level, because lower + * domains have smaller groups and usually share hardware + * resources which causes tasks to contend on them, e.g. x86 + * hyperthread siblings in the lowest domain (SMT) can contend + * on the shared cpu pipeline. + * + * However, while we prefer idle groups at higher domains + * finding an idle cpu at the lowest domain is still better than + * returning 'target', which we've already established, isn't + * idle. */ sd = rcu_dereference(per_cpu(sd_llc, target)); for_each_lower_domain(sd) { @@ -4981,11 +4993,16 @@ static int select_idle_sibling(struct task_struct *p, int target) tsk_cpus_allowed(p))) goto next; + /* Ensure the entire group is idle */ for_each_cpu(i, sched_group_cpus(sg)) { if (i == target || !idle_cpu(i)) goto next; } + /* + * It doesn't matter which cpu we pick, the + * whole group is idle. + */ target = cpumask_first_and(sched_group_cpus(sg), tsk_cpus_allowed(p)); goto done; -- 2.6.2