Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965372Ab2EOPgL (ORCPT ); Tue, 15 May 2012 11:36:11 -0400 Received: from casper.infradead.org ([85.118.1.10]:53718 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932272Ab2EOPgJ convert rfc822-to-8bit (ORCPT ); Tue, 15 May 2012 11:36:09 -0400 Message-ID: <1337096141.27694.82.camel@twins> Subject: Re: Plumbers: Tweaking scheduler policy micro-conf RFP From: Peter Zijlstra To: Vincent Guittot Cc: paulmck@linux.vnet.ibm.com, smuckle@quicinc.com, khilman@ti.com, Robin.Randhawa@arm.com, suresh.b.siddha@intel.com, thebigcorporation@gmail.com, venki@google.com, panto@antoniou-consulting.com, mingo@elte.hu, paul.brett@intel.com, pdeschrijver@nvidia.com, pjt@google.com, efault@gmx.de, fweisbec@gmail.com, geoff@infradead.org, rostedt@goodmis.org, tglx@linutronix.de, amit.kucheria@linaro.org, linux-kernel , linaro-sched-sig@lists.linaro.org, Morten Rasmussen , Juri Lelli Date: Tue, 15 May 2012 17:35:41 +0200 In-Reply-To: References: <1337084609.27020.156.camel@laptop> <1337086834.27020.162.camel@laptop> Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7BIT X-Mailer: Evolution 3.2.2- Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3002 Lines: 76 On Tue, 2012-05-15 at 17:05 +0200, Vincent Guittot wrote: > On 15 May 2012 15:00, Peter Zijlstra wrote: > > On Tue, 2012-05-15 at 14:57 +0200, Vincent Guittot wrote: > >> > >> Not sure that nobody cares but it's much more that scheduler, > >> load_balance and sched_mc are sensible enough that it's difficult to > >> ensure that a modification will not break everything for someone > >> else. > > > > Thing is, its already broken, there's nothing else to break :-) > > > > sched_mc is the only power-aware knob in the current scheduler. It's > far from being perfect but it seems to work on some ARM platform at > least. You mentioned at the scheduler mini-summit that we need a > cleaner replacement and everybody has agreed on that point. Is anybody > working on it yet ? Apparently not.. > and can we discuss at Plumber's what this replacement would look like ? one knob: sched_balance_policy with tri-state {performance, power, auto} Where auto should likely look at things like are we on battery and co-ordinate with cpufreq muck or whatever. Per domain knobs are insane, large multi-state knobs are insane, the existing scheme is therefore insane^2. Can you find a sysad who'd like to explore 3^3=27 states for optimal power/perf for his workload on a simple 2 socket hyper-threaded machine and 3^4=81 state space for 8 sockets etc..? As to the exact policy, I think the current 2 (load-balance + wakeup) is the sensible one.. Also, I still have this pending email from you asking about the topology setup stuff I really need to reply to.. but people keep sending me bugs reports :/ But really short, look at kernel/sched/core.c:default_topology[] I'd like to get rid of sd_init_* into a single function like sd_numa_init(), this would mean all archs would need to do is provide a simple list of ever increasing masks that match their topology. To aid this we can add some SDTL_flags, initially I was thinking of: SDTL_SHARE_CORE -- aka SMT SDTL_SHARE_CACHE -- LLC cache domain (typically multi-core) SDTL_SHARE_MEMORY -- NUMA-node (typically socket) The 'performance' policy is typically to spread over shared resources so as to minimize contention on these. If you want to add some power we need some extra flags, maybe something like: SDTL_SHARE_POWERLINE -- power domain (typically socket) so you know where the boundaries are where you can turn stuff off so you know what/where to pack bits. Possibly we also add something like: SDTL_PERF_SPREAD -- spread on performance mode SDTL_POWER_PACK -- pack on power mode To over-ride the defaults. But ideally I'd leave those until after we've got the basics working and there is a clear need for them (with a spread/pack default for perf/power aware). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/