Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755499AbbGFTlk (ORCPT ); Mon, 6 Jul 2015 15:41:40 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:56415 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754604AbbGFTlh (ORCPT ); Mon, 6 Jul 2015 15:41:37 -0400 Message-ID: <559AD9CE.4090309@fb.com> Date: Mon, 6 Jul 2015 15:41:02 -0400 From: Josef Bacik User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Mike Galbraith CC: Peter Zijlstra , , , , , kernel-team Subject: Re: [PATCH RESEND] sched: prefer an idle cpu vs an idle sibling for BALANCE_WAKE References: <1432761736-22093-1-git-send-email-jbacik@fb.com> <20150528102127.GD3644@twins.programming.kicks-ass.net> <20150528110514.GR18673@twins.programming.kicks-ass.net> <1434087305.3674.26.camel@gmail.com> <5581B70D.2000800@fb.com> <1434588939.3444.25.camel@gmail.com> <55823F33.7040005@fb.com> <1434600765.3393.9.camel@gmail.com> <55957871.7080906@fb.com> <1435905658.6418.52.camel@gmail.com> <1436025462.17152.37.camel@gmail.com> <1436080661.22930.22.camel@gmail.com> <1436159590.5850.27.camel@gmail.com> <559A91F4.7000903@fb.com> <1436207790.2940.30.camel@gmail.com> In-Reply-To: <1436207790.2940.30.camel@gmail.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [192.168.52.123] X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.14.151,1.0.33,0.0.0000 definitions=2015-07-06_09:2015-07-06,2015-07-06,1970-01-01 signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3929 Lines: 94 On 07/06/2015 02:36 PM, Mike Galbraith wrote: > On Mon, 2015-07-06 at 10:34 -0400, Josef Bacik wrote: >> On 07/06/2015 01:13 AM, Mike Galbraith wrote: >>> Hm. Piddling with pgbench, which doesn't seem to collapse into a >>> quivering heap when load exceeds cores these days, deltas weren't all >>> that impressive, but it does appreciate the extra effort a bit, and a >>> bit more when clients receive it as well. >>> >>> If you test, and have time to piddle, you could try letting wake_wide() >>> return 1 + sched_feat(WAKE_WIDE_IDLE) instead of adding only if wakee is >>> the dispatcher. >>> >>> Numbers from my little desktop box. >>> >>> NO_WAKE_WIDE_IDLE >>> postgres@homer:~> pgbench.sh >>> clients 8 tps = 116697.697662 >>> clients 12 tps = 115160.230523 >>> clients 16 tps = 115569.804548 >>> clients 20 tps = 117879.230514 >>> clients 24 tps = 118281.753040 >>> clients 28 tps = 116974.796627 >>> clients 32 tps = 119082.163998 avg 117092.239 1.000 >>> >>> WAKE_WIDE_IDLE >>> postgres@homer:~> pgbench.sh >>> clients 8 tps = 124351.735754 >>> clients 12 tps = 124419.673135 >>> clients 16 tps = 125050.716498 >>> clients 20 tps = 124813.042352 >>> clients 24 tps = 126047.442307 >>> clients 28 tps = 125373.719401 >>> clients 32 tps = 126711.243383 avg 125252.510 1.069 1.000 >>> >>> WAKE_WIDE_IDLE (clients as well as server) >>> postgres@homer:~> pgbench.sh >>> clients 8 tps = 130539.795246 >>> clients 12 tps = 128984.648554 >>> clients 16 tps = 130564.386447 >>> clients 20 tps = 129149.693118 >>> clients 24 tps = 130211.119780 >>> clients 28 tps = 130325.355433 >>> clients 32 tps = 129585.656963 avg 129908.665 1.109 1.037 > > I had a typo in my script, so those desktop box numbers were all doing > the same number of clients. It doesn't invalidate anything, but the > individual deltas are just run to run variance.. not to mention that > single cache box is not all that interesting for this anyway. That > happens when interconnect becomes a player. > >> I have time for twiddling, we're carrying ye olde WAKE_IDLE until we get >> this solved upstream and then I'll rip out the old and put in the new, >> I'm happy to screw around until we're all happy. I'll throw this in a >> kernel this morning and run stuff today. Barring any issues with the >> testing infrastructure I should have results today. Thanks, > > I'll be interested in your results. Taking pgbench to a little NUMA > box, I'm seeing _nada_ outside of variance with master (crap). I have a > way to win significantly for _older_ kernels, and that win over master > _may_ provide some useful insight, but I don't trust postgres/pgbench as > far as I can toss the planet, so don't have a warm fuzzy about trying to > use it to approximate your real world load. > > BTW, what's your topology look like (numactl --hardware). > So the NO_WAKE_WIDE_IDLE results are very good, almost the same as the baseline with a slight regression at lower RPS and a slight improvement at high RPS. I'm running with WAKE_WIDE_IDLE set now, that should be done soonish and then I'll do the 1 + sched_feat(WAKE_WIDE_IDLE) thing next and those results should come in the morning. Here is the numa information from one of the boxes in the test cluster available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 20 21 22 23 24 25 26 27 28 29 node 0 size: 15890 MB node 0 free: 2651 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 30 31 32 33 34 35 36 37 38 39 node 1 size: 16125 MB node 1 free: 2063 MB node distances: node 0 1 0: 10 20 1: 20 10 Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/