Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754543AbZDNJ63 (ORCPT ); Tue, 14 Apr 2009 05:58:29 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751796AbZDNJ6T (ORCPT ); Tue, 14 Apr 2009 05:58:19 -0400 Received: from e23smtp01.au.ibm.com ([202.81.31.143]:44684 "EHLO e23smtp01.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751531AbZDNJ6S (ORCPT ); Tue, 14 Apr 2009 05:58:18 -0400 Date: Tue, 14 Apr 2009 15:28:03 +0530 From: Gautham R Shenoy To: Peter Zijlstra Cc: Ingo Molnar , Vaidyanathan Srinivasan , linux-kernel@vger.kernel.org, Balbir Singh , Suresh Siddha , Andi Kleen , Randy Dunlap Subject: Re: [RFC PATCH v2 0/2] sched: Nominate a power-efficient ILB Message-ID: <20090414095803.GA11553@in.ibm.com> Reply-To: ego@in.ibm.com References: <20090414045356.7645.33369.stgit@sofia.in.ibm.com> <1239702484.21985.6857.camel@twins> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1239702484.21985.6857.camel@twins> User-Agent: Mutt/1.5.17 (2007-11-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6706 Lines: 120 On Tue, Apr 14, 2009 at 11:48:04AM +0200, Peter Zijlstra wrote: > On Tue, 2009-04-14 at 10:25 +0530, Gautham R Shenoy wrote: > > Hi, > > > > This is the second iteration of the patchset which aims at improving > > the idle-load balancer nomination logic, by taking the system topology > > into consideration. > > > > Changes from v1 (found here: http://lkml.org/lkml/2009/4/2/246) > > o Fixed the kernel-doc style comments. > > o Renamed a variable to better reflect it's usage. > > > > Background > > ====================================== > > An idle-load balancer is an idle-cpu which does not turn off it's sched_ticks > > and performs load-balancing on behalf of the other idle CPUs. Currently, > > this idle load balancer is nominated as the first_cpu(nohz.cpu_mask) > > > > The drawback of the current method is that the CPU numbering in the > > cores/packages need not necessarily be sequential. For example, on a > > two-socket, Quad core system, the CPU numbering can be as follows: > > > > |-------------------------------| |-------------------------------| > > | | | | | | > > | 0 | 2 | | 1 | 3 | > > |-------------------------------| |-------------------------------| > > | | | | | | > > | 4 | 6 | | 5 | 7 | > > |-------------------------------| |-------------------------------| > > > > Now, the other power-savings settings such as the sched_mc/smt_power_savings > > and the power-aware IRQ balancer try to balance tasks/IRQs by taking > > the system topology into consideration, with the intention of keeping > > as many "power-domains" (cores/packages) in the low-power state. > > > > The current idle-load-balancer nomination does not necessarily align towards > > this policy. For eg, we could be having tasks and interrupts largely running > > on the first package with the intention of keeping the second package idle. > > Hence, CPU 0 may be busy. The first_cpu in the nohz.cpu_mask happens to be CPU1, > > which in-turn becomes nominated as the idle-load balancer. CPU1 being from > > the 2nd package, would in turn prevent the 2nd package from going into a > > deeper sleep state. > > > > Instead the role of the idle-load balancer could have been assumed by an > > idle CPU from the first package, thereby helping the second package go > > completely idle. > > > > This patchset has been tested with 2.6.30-rc1 on a Two-Socket > > Quad core system with the topology as mentioned above. > > > > |----------------------------------------------------------------------------| > > | With Patchset + sched_mc_power_savings = 1 | > > |----------------------------------------------------------------------------| > > |make -j2 options| time taken | LOC timer interrupts | LOC timer interrupts| > > | | | on Package 0 | on Package 1 | > > |----------------------------------------------------------------------------| > > |taskset -c 0,2 | | CPU0 | CPU2 | CPU1 | CPU3 | > > | | 227.234s | 56969 | 57080 | 1003 | 588 | > > | | |----------------------------------------------| > > | | | CPU4 | CPU6 | CPU5 | CPU7 | > > | | | 55995 | 703 | 583 | 600 | > > |----------------------------------------------------------------------------| > > |taskset -c 1,3 | | CPU0 | CPU2 | CPU1 | CPU3 | > > | | 227.136s | 1109 | 611 | 57074 | 57091 | > > | | |----------------------------------------------| > > | | | CPU4 | CPU6 | CPU5 | CPU7 | > > | | | 709 | 637 | 56133 | 587 | > > |----------------------------------------------------------------------------| > > > > We see here that the idle load balancer is chosen from the package which is > > busy. In the first case, it's CPU4 and in the second case it's CPU5. > > > > |----------------------------------------------------------------------------| > > | With Patchset + sched_mc_power_savings = 1 | ^^^^ Without > > |----------------------------------------------------------------------------| > > |make -j2 options| time taken | LOC timer interrupts | LOC timer interrupts| > > | | | on Package 0 | on Package 1 | > > |----------------------------------------------------------------------------| > > |taskset -c 0,2 | | CPU0 | CPU2 | CPU1 | CPU3 | > > | | 228.786s | 59094 | 61994 | 13984 | 43652 | > > | | |----------------------------------------------| > > | | | CPU4 | CPU6 | CPU5 | CPU7 | > > | | | 1827 | 734 | 748 | 760 | > > |----------------------------------------------------------------------------| > > |taskset -c 1,3 | | CPU0 | CPU2 | CPU1 | CPU3 | > > | | 228.435s | 57013 | 876 | 58596 | 61633 | > > | | |----------------------------------------------| > > | | | CPU4 | CPU6 | CPU5 | CPU7 | > > | | | 772 | 1133 | 850 | 910 | > > |----------------------------------------------------------------------------| > > > > Here, we see that the idle load balancer is chosen from the other package, > > despite choosing sched_mc_power_savings = 1. In the first case, we have > > CPU1 and CPU3 sharing the responsibility among themselves. In the second case, > > it's CPU0 and CPU6, which assume that role. > > Both tables above claim to be _with_ the pathes :-), from the > accompanying text one can deduce its the bottom one that is without. Sorry, copy pasted the 2nd table from the first, and updated only the values. > > Patches look straight-forward enough, seems good stuff. Thanks for the review! > > Thanks! -- Thanks and Regards gautham -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/