Message-ID: <1435905658.6418.52.camel@gmail.com>
Subject: Re: [PATCH RESEND] sched: prefer an idle cpu vs an idle sibling for
 BALANCE_WAKE
From: Mike Galbraith <umgwanakikbuti@gmail.com>
To: Josef Bacik <jbacik@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>, riel@redhat.com, mingo@redhat.com,
        linux-kernel@vger.kernel.org, morten.rasmussen@arm.com,
        kernel-team <Kernel-team@fb.com>
Date: Fri, 03 Jul 2015 08:40:58 +0200
In-Reply-To: <55957871.7080906@fb.com>
References: <1432761736-22093-1-git-send-email-jbacik@fb.com>
	 <20150528102127.GD3644@twins.programming.kicks-ass.net>
	 <20150528110514.GR18673@twins.programming.kicks-ass.net>
	 <1434087305.3674.26.camel@gmail.com> <5581B70D.2000800@fb.com>
	 <1434588939.3444.25.camel@gmail.com> <55823F33.7040005@fb.com>
	 <1434600765.3393.9.camel@gmail.com> <55957871.7080906@fb.com>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2450
Lines: 72

On Thu, 2015-07-02 at 13:44 -0400, Josef Bacik wrote:

> Now for 3.10 vs 4.0 our request duration time is the same if not 
> slightly better on 4.0, so once the workers are doing their job 
> everything is a-ok.
> 
> The problem is the probability the select queue >= 1 is way different on 
> 4.0 vs 3.10.  Normally this graph looks like an S, it's essentially 0 up 
> to some RPS (requests per second) threshold and then shoots up to 100% 
> after the threshold.  I'll make a table of these graphs that hopefully 
> makes sense, the numbers are different from run to run because of 
> traffic and such, the test and control are both run at the same time. 
> The header is the probability the select queue >=1
> 
> 		25%	50%	75%
> 4.0 plain: 	371	388	402
> control:	386	394	402
> difference:	15	6	0

So control is 3.10?  Virgin?

> So with 4.0 its basically a straight line, at lower RPS we are getting a 
> higher probability of a select queue >= 1.  We are measuring the cpu 
> delay avg ms thing from the scheduler netlink stuff which is how I 
> noticed it was scheduler related, our cpu delay is way higher on 4.0 
> than it is on 3.10 or 4.0 with the wake idle patch.
> 
> So the next test is NO_PREFER_IDLE.  This is slightly better than 4.0 plain
> 		25%	50%	75%
> NO_PREFER_IDLE:	399	401	414
> control:	385	408	416
> difference:	14	7	2

Hm.  Throttling nohz may make larger delta.  But never mind that.

> The numbers don't really show it well, but the graphs are closer 
> together, it's slightly more s shaped, but still not great.
> 
> Next is NO_WAKE_WIDE, which is horrible
> 
> 		25%	50%	75%
> NO_WAKE_WIDE:	315	344	369
> control:	373	380	388
> difference:	58	36	19
> 
> This isn't even in the same ballpark, it's a way worse regression than 
> plain.

Ok, this jibes perfectly with 1:N waker/wakee thingy.

> The next bit is NO_WAKE_WIDE|NO_PREFER_IDLE, which is just as bad
> 
> 		25%	50%	75%
> EVERYTHING:	327	360	383
> control:	381	390	399
> difference:	54	30	19

Ditto.

Hm.  Seems what this load should like best is if we detect 1:N, skip all
of the routine gyrations, ie move the N (workers) infrequently, expend
search cycles frequently only on the 1 (dispatch).

Ponder..

	-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/