Message-ID: <1435915747.6658.9.camel@gmail.com>
Subject: Re: [PATCH RESEND] sched: prefer an idle cpu vs an idle sibling for
 BALANCE_WAKE
From: Mike Galbraith <umgwanakikbuti@gmail.com>
To: Josef Bacik <jbacik@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>, riel@redhat.com, mingo@redhat.com,
        linux-kernel@vger.kernel.org, morten.rasmussen@arm.com,
        kernel-team <Kernel-team@fb.com>
Date: Fri, 03 Jul 2015 11:29:07 +0200
In-Reply-To: <1435905658.6418.52.camel@gmail.com>
References: <1432761736-22093-1-git-send-email-jbacik@fb.com>
	 <20150528102127.GD3644@twins.programming.kicks-ass.net>
	 <20150528110514.GR18673@twins.programming.kicks-ass.net>
	 <1434087305.3674.26.camel@gmail.com> <5581B70D.2000800@fb.com>
	 <1434588939.3444.25.camel@gmail.com> <55823F33.7040005@fb.com>
	 <1434600765.3393.9.camel@gmail.com> <55957871.7080906@fb.com>
	 <1435905658.6418.52.camel@gmail.com>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2905
Lines: 84

On Fri, 2015-07-03 at 08:40 +0200, Mike Galbraith wrote:

> Hm.  Seems what this load should like best is if we detect 1:N, skip all
> of the routine gyrations, ie move the N (workers) infrequently, expend
> search cycles frequently only on the 1 (dispatch).
> 
> Ponder..

While taking a refresher peek at the wake_wide() thing, seems it's not
really paying attention when the waker of many is awakened.  I wonder if
your load would see more benefit if it watched like so.. rashly assuming
I didn't wreck it completely (iow, completely untested).

---
 kernel/sched/fair.c |   36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4586,10 +4586,23 @@ static void record_wakee(struct task_str
 		current->wakee_flips >>= 1;
 		current->wakee_flip_decay_ts = jiffies;
 	}
+	if (time_after(jiffies, p->wakee_flip_decay_ts + HZ)) {
+		p->wakee_flips >>= 1;
+		p->wakee_flip_decay_ts = jiffies;
+	}
 
 	if (current->last_wakee != p) {
 		current->last_wakee = p;
 		current->wakee_flips++;
+		/*
+		 * Flip the buddy as well.  It's the ratio of flips
+		 * with a socket size decayed cutoff that determines
+		 * whether the pair are considered to be part of 1:N
+		 * or M*N loads of a size that we need to spread, so
+		 * ensure flips of both load components.  The waker
+		 * of many will have many more flips than its wakees.
+		 */
+		p->wakee_flips++;
 	}
 }
 
@@ -4732,24 +4745,19 @@ static long effective_load(struct task_g
 
 static int wake_wide(struct task_struct *p)
 {
+	unsigned long max = max(current->wakee_flips, p->wakee_flips);
+	unsigned long min = min(current->wakee_flips, p->wakee_flips);
 	int factor = this_cpu_read(sd_llc_size);
 
 	/*
-	 * Yeah, it's the switching-frequency, could means many wakee or
-	 * rapidly switch, use factor here will just help to automatically
-	 * adjust the loose-degree, so bigger node will lead to more pull.
+	 * Yeah, it's a switching-frequency heuristic, and could mean the
+	 * intended many wakees/waker relationship, or rapidly switching
+	 * between a few.  Use factor to try to automatically adjust such
+	 * that the load spreads when it grows beyond what will fit in llc.
 	 */
-	if (p->wakee_flips > factor) {
-		/*
-		 * wakee is somewhat hot, it needs certain amount of cpu
-		 * resource, so if waker is far more hot, prefer to leave
-		 * it alone.
-		 */
-		if (current->wakee_flips > (factor * p->wakee_flips))
-			return 1;
-	}
-
-	return 0;
+	if (min < factor)
+		return 0;
+	return max > min * factor;
 }
 
 static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/