Message-ID: <BLU437-SMTP26FF0C83CB1EA7DECF2FAC80A50@phx.gbl>
Date: Thu, 18 Jun 2015 14:31:00 +0800
From: Wanpeng Li <wanpeng.li@hotmail.com>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.0
MIME-Version: 1.0
To: Yuyang Du <yuyang.du@intel.com>, Boqun Feng <boqun.feng@gmail.com>
CC: mingo@kernel.org, peterz@infradead.org, linux-kernel@vger.kernel.org,
        pjt@google.com, bsegall@google.com, morten.rasmussen@arm.com,
        vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
        len.brown@intel.com, rafael.j.wysocki@intel.com,
        fengguang.wu@intel.com, srikar@linux.vnet.ibm.com
Subject: Re: [Resend PATCH v8 0/4] sched: Rewrite runnable load and utilization
 average tracking
References: <1434396367-27979-1-git-send-email-yuyang.du@intel.com> <20150617030650.GB5695@fixme-laptop.cn.ibm.com> <20150617051501.GA7154@fixme-laptop.cn.ibm.com> <20150617031101.GC1244@intel.com>
In-Reply-To: <20150617031101.GC1244@intel.com>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4045
Lines: 146


On 6/17/15 11:11 AM, Yuyang Du wrote:
> Hi,
>
> The sched_debug is informative, lets first give it some analysis.
>
> The workload is 12 CPU hogging tasks (always runnable) and 1 dbench
> task doing fs ops (70% runnable) running at the same time.
>
> Actually, these 13 tasks are in a task group /autogroup-9617, which
> has weight 1024.
>
> So the 13 tasks at most can contribute to an average of 79 (=1024/13)
> to the group entity's load_avg:
>
> cfs_rq[0]:/autogroup-9617
> .se->load.weight               : 2
> .se->avg.load_avg              : 0
>
> cfs_rq[1]:/autogroup-9617
> .se->load.weight               : 80
> .se->avg.load_avg              : 79
>
> cfs_rq[2]:/autogroup-9617
> .se->load.weight               : 79
> .se->avg.load_avg              : 78
>
> cfs_rq[3]:/autogroup-9617
> .se->load.weight               : 80
> .se->avg.load_avg              : 81
>
> cfs_rq[4]:/autogroup-9617
> .se->load.weight               : 80
> .se->avg.load_avg              : 79
>
> cfs_rq[5]:/autogroup-9617
> .se->load.weight               : 79
> .se->avg.load_avg              : 77
>
> cfs_rq[6]:/autogroup-9617
> .se->load.weight               : 159
> .se->avg.load_avg              : 156
>
> cfs_rq[7]:/autogroup-9617
> .se->load.weight               : 64  (dbench)
> .se->avg.load_avg              : 50

How you figure out this one is dbench?

Regards,
Wanpeng Li

>
> cfs_rq[8]:/autogroup-9617
> .se->load.weight               : 80
> .se->avg.load_avg              : 78
>
> cfs_rq[9]:/autogroup-9617
> .se->load.weight               : 159
> .se->avg.load_avg              : 156
>
> cfs_rq[10]:/autogroup-9617
> .se->load.weight               : 80
> .se->avg.load_avg              : 78
>
> cfs_rq[11]:/autogroup-9617
> .se->load.weight               : 79
> .se->avg.load_avg              : 78
>
> So this is very good runnable load avg accrued in the task group
> structure.
>
> However, why the cpu0 is very underload?
>
> The top cfs's load_avg is:
>
> cfs_rq[0]: 754
> cfs_rq[1]: 81
> cfs_rq[2]: 85
> cfs_rq[3]: 80
> cfs_rq[4]: 142
> cfs_rq[5]: 86
> cfs_rq[6]: 159
> cfs_rq[7]: 264
> cfs_rq[8]: 79
> cfs_rq[9]: 156
> cfs_rq[10]: 78
> cfs_rq[11]: 79
>
> We see cfs_rq[0]'s load_avg is 754 even it is underloaded.
>
> So the problem is:
>
> 1) The tasks in the workload have too small weight (only 79), because
>     they share a task group.
>
> 2) Probably some "high" weight task even runnable a small time
>     contribute "big" to cfs_rq's load_avg.
>
> The patchset does what it wants to do:
>
> 1) very precise task group's load avg tracking from group to children
>     tasks and from children tasks to group.
>
> 2) the combined runnable + blocked load_avg is effective, so the blocked
>     avg made its impact.
>
> I will try to figure out what makes the cfs_rq[0]'s 754 load_avg, but
> I also think that the tasks have so small weight that they are very
> easy to be fairly "imbalanced" ....
>
> Peter, Ben, and others?
>
> In addition, the util_avg sometimes is insanely big, I think I already
> found the problem.
>
> Thanks,
> Yuyang
>
> On Wed, Jun 17, 2015 at 01:15:01PM +0800, Boqun Feng wrote:
>> On Wed, Jun 17, 2015 at 11:06:50AM +0800, Boqun Feng wrote:
>>> Hi Yuyang,
>>>
>>> I've run the test as follow on tip/master without and with your
>>> patchset:
>>>
>>> On a 12-core system (Intel(R) Xeon(R) CPU X5690 @ 3.47GHz)
>>> run stress --cpu 12
>>> run dbench 1
>> Sorry, I forget to say that `stress --cpu 12` and `dbench 1` are running
>> simultaneously. Thank Yuyang for reminding me that.
>>
>> Regards,
>> Boqun
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/