Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755217Ab3CLItG (ORCPT ); Tue, 12 Mar 2013 04:49:06 -0400 Received: from mail-bk0-f46.google.com ([209.85.214.46]:64097 "EHLO mail-bk0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752813Ab3CLItC (ORCPT ); Tue, 12 Mar 2013 04:49:02 -0400 Date: Tue, 12 Mar 2013 09:48:57 +0100 From: Ingo Molnar To: Michael Wang Cc: Peter Zijlstra , LKML , Mike Galbraith , Namhyung Kim , Alex Shi , Paul Turner , Andrew Morton , "Nikunj A. Dadhania" , Ram Pai Subject: Re: [PATCH] sched: wakeup buddy Message-ID: <20130312084857.GA4859@gmail.com> References: <5136EB06.2050905@linux.vnet.ibm.com> <1362645372.2606.11.camel@laptop> <20130311082105.GB12742@gmail.com> <513DA076.80009@linux.vnet.ibm.com> <20130311094031.GA14221@gmail.com> <513EC47E.9040602@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <513EC47E.9040602@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2983 Lines: 75 * Michael Wang wrote: > On 03/11/2013 05:40 PM, Ingo Molnar wrote: > > > > * Michael Wang wrote: > > > >> Hi, Ingo > >> > >> On 03/11/2013 04:21 PM, Ingo Molnar wrote: > >> [snip] > >>> > >>> I have actually written the prctl() approach before, for instrumentation > >>> purposes, and it does wonders to system analysis. > >> > >> The idea sounds great, we could get many new info to implement more > >> smart scheduler, that's amazing :) > >> > >>> > >>> Any objections? > >> > >> Just one concern, may be I have misunderstand you, but will it cause > >> trouble if the prctl() was indiscriminately used by some applications, > >> will we get fake data? > > > > It's their problem: overusing it will increase their CPU overhead. The two > > boundary worst-cases are that they either call it too frequently or too > > rarely: > > > > - too frequently: it approximates the current cpu-runtime work metric > > > > - too infrequently: we just ignore it and fall back to a runtime metric > > if it does not change. > > > > It's not like it can be used to get preferential treatment - we don't ever > > balance other tasks against these tasks based on work throughput, we try > > to maximize this workload's work throughput. > > > > What could happen is if an app is 'optimized' for a buggy scheduler by > > changing the work metric frequency. We offer no guarantee - apps will be > > best off (and users will be least annoyed) if apps honestly report their > > work metric. > > > > Instrumentation/stats/profiling will also double check the correctness of > > this data: if developers/users start relying on the work metric as a > > substitute benchmark number, then app writers will have an additional > > incentive to make them correct. > > I see, I could not figure out how to wisely using the info currently, > but I have the feeling that it will make scheduler very different ;-) > > May be we could implement the API and get those info ready firstly > (along with the new sched-pipe which provide work tick info), then think > about the way to use them in scheduler, is there any patches on the way? Absolutely. Beyond the new prctl no new API is needed: a perf soft event could be added, and/or a tracepoint. Then perf stat and perf record could be used with it. 'perf bench' could be extended to generate the work tick in its 'perf bench sched ...' workloads - and for 'perf bench mem numa' as well. vsyscall-accelerating it could be a separate, more complex step: it needs a per thread writable vsyscall data area to make the overhead to applications near zero. Performance critical apps won't call an extra syscall. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/