Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932133AbbFQLEC (ORCPT ); Wed, 17 Jun 2015 07:04:02 -0400 Received: from mga01.intel.com ([192.55.52.88]:62278 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752020AbbFQLDx (ORCPT ); Wed, 17 Jun 2015 07:03:53 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,632,1427785200"; d="scan'208";a="745059376" Date: Wed, 17 Jun 2015 11:11:01 +0800 From: Yuyang Du To: Boqun Feng Cc: mingo@kernel.org, peterz@infradead.org, linux-kernel@vger.kernel.org, pjt@google.com, bsegall@google.com, morten.rasmussen@arm.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, len.brown@intel.com, rafael.j.wysocki@intel.com, fengguang.wu@intel.com, srikar@linux.vnet.ibm.com Subject: Re: [Resend PATCH v8 0/4] sched: Rewrite runnable load and utilization average tracking Message-ID: <20150617031101.GC1244@intel.com> References: <1434396367-27979-1-git-send-email-yuyang.du@intel.com> <20150617030650.GB5695@fixme-laptop.cn.ibm.com> <20150617051501.GA7154@fixme-laptop.cn.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150617051501.GA7154@fixme-laptop.cn.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3492 Lines: 134 Hi, The sched_debug is informative, lets first give it some analysis. The workload is 12 CPU hogging tasks (always runnable) and 1 dbench task doing fs ops (70% runnable) running at the same time. Actually, these 13 tasks are in a task group /autogroup-9617, which has weight 1024. So the 13 tasks at most can contribute to an average of 79 (=1024/13) to the group entity's load_avg: cfs_rq[0]:/autogroup-9617 .se->load.weight : 2 .se->avg.load_avg : 0 cfs_rq[1]:/autogroup-9617 .se->load.weight : 80 .se->avg.load_avg : 79 cfs_rq[2]:/autogroup-9617 .se->load.weight : 79 .se->avg.load_avg : 78 cfs_rq[3]:/autogroup-9617 .se->load.weight : 80 .se->avg.load_avg : 81 cfs_rq[4]:/autogroup-9617 .se->load.weight : 80 .se->avg.load_avg : 79 cfs_rq[5]:/autogroup-9617 .se->load.weight : 79 .se->avg.load_avg : 77 cfs_rq[6]:/autogroup-9617 .se->load.weight : 159 .se->avg.load_avg : 156 cfs_rq[7]:/autogroup-9617 .se->load.weight : 64 (dbench) .se->avg.load_avg : 50 cfs_rq[8]:/autogroup-9617 .se->load.weight : 80 .se->avg.load_avg : 78 cfs_rq[9]:/autogroup-9617 .se->load.weight : 159 .se->avg.load_avg : 156 cfs_rq[10]:/autogroup-9617 .se->load.weight : 80 .se->avg.load_avg : 78 cfs_rq[11]:/autogroup-9617 .se->load.weight : 79 .se->avg.load_avg : 78 So this is very good runnable load avg accrued in the task group structure. However, why the cpu0 is very underload? The top cfs's load_avg is: cfs_rq[0]: 754 cfs_rq[1]: 81 cfs_rq[2]: 85 cfs_rq[3]: 80 cfs_rq[4]: 142 cfs_rq[5]: 86 cfs_rq[6]: 159 cfs_rq[7]: 264 cfs_rq[8]: 79 cfs_rq[9]: 156 cfs_rq[10]: 78 cfs_rq[11]: 79 We see cfs_rq[0]'s load_avg is 754 even it is underloaded. So the problem is: 1) The tasks in the workload have too small weight (only 79), because they share a task group. 2) Probably some "high" weight task even runnable a small time contribute "big" to cfs_rq's load_avg. The patchset does what it wants to do: 1) very precise task group's load avg tracking from group to children tasks and from children tasks to group. 2) the combined runnable + blocked load_avg is effective, so the blocked avg made its impact. I will try to figure out what makes the cfs_rq[0]'s 754 load_avg, but I also think that the tasks have so small weight that they are very easy to be fairly "imbalanced" .... Peter, Ben, and others? In addition, the util_avg sometimes is insanely big, I think I already found the problem. Thanks, Yuyang On Wed, Jun 17, 2015 at 01:15:01PM +0800, Boqun Feng wrote: > On Wed, Jun 17, 2015 at 11:06:50AM +0800, Boqun Feng wrote: > > Hi Yuyang, > > > > I've run the test as follow on tip/master without and with your > > patchset: > > > > On a 12-core system (Intel(R) Xeon(R) CPU X5690 @ 3.47GHz) > > run stress --cpu 12 > > run dbench 1 > > Sorry, I forget to say that `stress --cpu 12` and `dbench 1` are running > simultaneously. Thank Yuyang for reminding me that. > > Regards, > Boqun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/