Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261298AbVDDR3w (ORCPT ); Mon, 4 Apr 2005 13:29:52 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261300AbVDDR3w (ORCPT ); Mon, 4 Apr 2005 13:29:52 -0400 Received: from omx2-ext.sgi.com ([192.48.171.19]:50141 "EHLO omx2.sgi.com") by vger.kernel.org with ESMTP id S261298AbVDDR31 (ORCPT ); Mon, 4 Apr 2005 13:29:27 -0400 Date: Mon, 4 Apr 2005 10:27:34 -0700 From: Paul Jackson To: Ingo Molnar Cc: kenneth.w.chen@intel.com, torvalds@osdl.org, nickpiggin@yahoo.com.au, akpm@osdl.org, linux-kernel@vger.kernel.org Subject: Re: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels] Message-Id: <20050404102734.1fbba019.pj@engr.sgi.com> In-Reply-To: <20050404113743.GA28994@elte.hu> References: <20050403070415.GA18893@elte.hu> <200504040111.j341BUg31885@unix-os.sc.intel.com> <20050404113743.GA28994@elte.hu> Organization: SGI X-Mailer: Sylpheed version 1.0.0 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 11189 Lines: 184 Ingo wrote: > i've attached the latest snapshot. I ran your latest snapshot on 64 CPU (well, 62 - one node wasn't working) system. I made one change - chop the matrix lines at 8 terms. It's a hack - don't know if it's a good idea. But the long lines were hard to read (and would only get worse on a 512). And I had a fear, probably unfounded, that the long lines could slow things down. It built and ran fine, exactly as provided, against 2.6.12-rc1-mm4. I probably have the unchopped matrix output in my screenlog file, if you want it. Though, given that the matrix is more or less symmetric, I wasn't seeing much value in the part I chopped. It took 24 seconds - a little painful, but booting this system takes a few minutes, so 24 seconds is not fatal - just painful. The maximum finding code - to stop scanning after the max has been passed, works fine. If it had been (impossibly) perfect, stopping right at the max, it would have been perhaps 30% faster, so there is not a huge amount to be gained from trying to fine tune the scan termination logic. I can imagine that one could trim this time by doing a couple of scans, the first one at lower density (perhaps just one out of four sizes considered), then the second scan at full density, around the maximum found by the first. However this would be less robust, and yet more logic. Or perhaps, long shot, one could get fancy with some parameterized curve fitting. If some equation is a reasonably fit for the function being sampled here, then just a low density scan through the max could be used to estimate the co-efficients of whatever the equation was, and the equation used to find the maximum, instead of the samples. This would be fun to play with, but I can't now - other duties are calling. The one change: diff -Naurp auto-tune_migration_costs/kernel/sched.c auto-tune_migration_costs_chopped/kernel/sched.c --- auto-tune_migration_costs/kernel/sched.c 2005-04-04 09:11:43.000000000 -0700 +++ auto-tune_migration_costs_chopped/kernel/sched.c 2005-04-04 09:11:22.000000000 -0700 @@ -5287,6 +5287,7 @@ void __devinit calibrate_migration_costs distance = domain_distance(cpu1, cpu2); max_distance = max(max_distance, distance); cost = migration_cost[distance]; + if (cpu2 < 8) printk(" %2ld.%ld(%ld)", (long)cost / 1000000, ((long)cost / 100000) % 10, distance); } With this change, the output was: Memory: 243350592k/244270096k available (7182k code, 921216k reserved, 3776k data, 368k init) McKinley Errata 9 workaround not needed; disabling it Dentry cache hash table entries: 33554432 (order: 14, 268435456 bytes) Inode-cache hash table entries: 16777216 (order: 13, 134217728 bytes) Mount-cache hash table entries: 1024 Boot processor id 0x0/0x40 Brought up 62 CPUs Total of 62 processors activated (138340.68 BogoMIPS). -> [0][2][3145728] 12.3 [ 12.3] (1): (12361880 6180940) -> [0][2][3311292] 13.1 [ 13.1] (1): (13175591 3497325) -> [0][2][3485570] 13.7 [ 13.7] (1): (13718647 2020190) -> [0][2][3669021] 14.3 [ 14.3] (1): (14356800 1329171) -> [0][2][3862127] 15.5 [ 15.5] (1): (15522156 1247263) -> [0][2][4065396] 16.4 [ 16.4] (1): (16487934 1106520) -> [0][2][4279364] 17.3 [ 17.3] (1): (17356154 987370) -> [0][2][4504593] 18.1 [ 18.1] (1): (18144452 887834) -> [0][2][4741676] 18.9 [ 18.9] (1): (18934638 839010) -> [0][2][4991237] 19.9 [ 19.9] (1): (19965884 935128) -> [0][2][5253933] 21.0 [ 21.0] (1): (21067441 1018342) -> [0][2][5530455] 22.3 [ 22.3] (1): (22303727 1127314) -> [0][2][5821531] 23.4 [ 23.4] (1): (23453867 1138727) -> [0][2][6127927] 23.4 [ 23.4] (1): (23406625 592984) -> [0][2][6450449] 23.5 [ 23.5] (1): (23586123 386241) -> [0][2][6789946] 23.5 [ 23.5] (1): (23519823 226270) -> [0][2][7147311] 22.6 [ 23.5] (1): (22619385 563354) -> [0][2][7523485] 21.9 [ 23.5] (1): (21998024 592357) -> [0][2][7919457] 20.7 [ 23.5] (1): (20705771 942305) -> [0][2][8336270] 17.2 [ 23.5] (1): (17244361 2201857) -> [0][2][8775021] 14.6 [ 23.5] (1): (14644331 2400943) -> found max. [0][2] working set size found: 6450449, cost: 23586123 -> [0][32][3145728] 17.8 [ 17.8] (2): (17848927 8924463) -> [0][32][3311292] 18.8 [ 18.8] (2): (18811236 4943386) -> [0][32][3485570] 19.7 [ 19.7] (2): (19779337 2955743) -> [0][32][3669021] 20.8 [ 20.8] (2): (20811634 1994020) -> [0][32][3862127] 21.9 [ 21.9] (2): (21919806 1551096) -> [0][32][4065396] 23.0 [ 23.0] (2): (23075814 1353552) -> [0][32][4279364] 24.2 [ 24.2] (2): (24267691 1272714) -> [0][32][4504593] 25.5 [ 25.5] (2): (25546809 1275916) -> [0][32][4741676] 26.8 [ 26.8] (2): (26886375 1307741) -> [0][32][4991237] 28.2 [ 28.2] (2): (28291601 1356483) -> [0][32][5253933] 29.5 [ 29.5] (2): (29587239 1326060) -> [0][32][5530455] 30.6 [ 30.6] (2): (30669228 1204024) -> [0][32][5821531] 30.9 [ 30.9] (2): (30969069 751932) -> [0][32][6127927] 30.3 [ 30.9] (2): (30353322 683839) -> [0][32][6450449] 29.3 [ 30.9] (2): (29381521 827820) -> [0][32][6789946] 27.4 [ 30.9] (2): (27459958 1374691) -> [0][32][7147311] 26.4 [ 30.9] (2): (26403308 1215670) -> [0][32][7523485] 23.9 [ 30.9] (2): (23967782 1825598) -> [0][32][7919457] 19.4 [ 30.9] (2): (19483305 3155037) -> found max. [0][32] working set size found: 5821531, cost: 30969069 --------------------- | migration cost matrix (max_cache_size: 6291456, cpu: -1 MHz): --------------------- [00] [01] [02] [03] [04] [05] [06] [07] [08] [09] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [00]: - 0.0(0) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) [01]: 0.0(0) - 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) [02]: 47.1(1) 47.1(1) - 0.0(0) 47.1(1) 47.1(1) 47.1(1) 47.1(1) [03]: 47.1(1) 47.1(1) 0.0(0) - 47.1(1) 47.1(1) 47.1(1) 47.1(1) [04]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) - 0.0(0) 47.1(1) 47.1(1) [05]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 0.0(0) - 47.1(1) 47.1(1) [06]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - 0.0(0) [07]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 0.0(0) - [08]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [09]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [10]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [11]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [12]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [13]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [14]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [15]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [16]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [17]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [18]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [19]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [20]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [21]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [22]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [23]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [24]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [25]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [26]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [27]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [28]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [29]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [30]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [31]: 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) - [32]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) - [33]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) - [34]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1) - [35]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1) - [36]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) - [37]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) - [38]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1) - [39]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1) - [40]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) - [41]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) - [42]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1) - [43]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1) - [44]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) - [45]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) - [46]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1) - [47]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1) - [48]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) - [49]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) - [50]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) - [51]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) - [52]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) - [53]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) - [54]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) - [55]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) - [56]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) - [57]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) - [58]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) - [59]: 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) - [60]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) - [61]: 61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) - -------------------------------- | cacheflush times [3]: 0.0 (-1) 47.1 (47172246) 61.9 (61938138) | calibration delay: 24 seconds -------------------------------- -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/