Message-ID: <50B461A3.8000200@linux.vnet.ibm.com>
Date: Tue, 27 Nov 2012 12:15:55 +0530
From: Preeti U Murthy <preeti@linux.vnet.ibm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120717 Thunderbird/14.0
MIME-Version: 1.0
To: Alex Shi <alex.shi@intel.com>
CC: Benjamin Segall <bsegall@google.com>, mingo@redhat.com,
        peterz@infradead.org, pjt@google.com, vincent.guittot@linaro.org,
        linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 0/5] enable runnable load avg in load balance
References: <1353157457-3649-1-git-send-email-alex.shi@intel.com> <xm26ehjg5gra.fsf@sword-of-the-dawn.mtv.corp.google.com> <50B42EB0.8090609@linux.vnet.ibm.com> <50B45A2A.7030201@intel.com>
In-Reply-To: <50B45A2A.7030201@intel.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4333
Lines: 106

Hi,
On 11/27/2012 11:44 AM, Alex Shi wrote:
> On 11/27/2012 11:08 AM, Preeti U Murthy wrote:
>> Hi everyone,
>>
>> On 11/27/2012 12:33 AM, Benjamin Segall wrote:
>>> So, I've been trying out using the runnable averages for load balance in
>>> a few ways, but haven't actually gotten any improvement on the
>>> benchmarks I've run. I'll post my patches once I have the numbers down,
>>> but it's generally been about half a percent to 1% worse on the tests
>>> I've tried.
>>>
>>> The basic idea is to use (cfs_rq->runnable_load_avg +
>>> cfs_rq->blocked_load_avg) (which should be equivalent to doing
>>> load_avg_contrib on the rq) for cfs_rqs and possibly the rq, and
>>> p->se.load.weight * p->se.avg.runnable_avg_sum / period for tasks.
>>
>> Why should cfs_rq->blocked_load_avg be included to calculate the load
>> on the rq? They do not contribute to the active load of the cpu right?
>>
>> When a task goes to sleep its load is removed from cfs_rq->load.weight
>> as well in account_entity_dequeue(). Which means the load balancer
>> considers a sleeping entity as *not* contributing to the active runqueue
>> load.So shouldn't the new metric consider cfs_rq->runnable_load_avg alone?
>>>
>>> I have not yet tried including wake_affine, so this has just involved
>>> h_load (task_load_down and task_h_load), as that makes everything
>>> (besides wake_affine) be based on either the new averages or the
>>> rq->cpu_load averages.
>>>
>>
>> Yeah I have been trying to view the performance as well,but with
>> cfs_rq->runnable_load_avg as the rq load contribution and the task load,
>> same as mentioned above.I have not completed my experiments but I would
>> expect some significant performance difference due to the below scenario:
>>
>>                      Task3(10% task)
>> Task1(100% task)     Task4(10% task)
>> Task2(100% task)     Task5(10% task)
>> ---------------     ----------------       ----------
>> CPU1                  CPU2                  CPU3
>>
>> When cpu3 triggers load balancing:
>>
>> CASE1:
>>  without PJT's metric the following loads will be perceived
>>  CPU1->2048
>>  CPU2->3042
>>  Therefore CPU2 might be relieved of one task to result in:
>>
>>
>> Task1(100% task)     Task4(10% task)
>> Task2(100% task)     Task5(10% task)       Task3(10% task)
>> ---------------     ----------------       ----------
>> CPU1                  CPU2                  CPU3
>>
>> CASE2:
>>   with PJT's metric the following loads will be perceived
>>   CPU1->2048
>>   CPU2->1022
>>  Therefore CPU1 might be relieved of one task to result in:
>>
>>                      Task3(10% task)
>>                      Task4(10% task)
>> Task2(100% task)     Task5(10% task)     Task1(100% task)
>> ---------------     ----------------       ----------
>> CPU1                  CPU2                  CPU3
>>
>>
>> The differences between the above two scenarios include:
>>
>> 1.Reduced latency for Task1 in CASE2,which is the right task to be moved
>> in the above scenario.
>>
>> 2.Even though in the former case CPU2 is relieved of one task,its of no
>> use if Task3 is going to sleep most of the time.This might result in
>> more load balancing on behalf of cpu3.
>>
>> What do you guys think?
> 
> It looks fine. just a question of CASE 1.
> Usually the cpu2 with 3 10% load task will show nr_running == 0, at 70%
> time. So, how you make rq->nr_running = 3 always?
> 
> Guess in most chance load balance with pull task1 or task2 to cpu2 or
> cpu3. not the result of CASE 1.

Thats right Alex.Most of the time the nr_running on CPU2 will be shown
to be 0 or perhaps 1/2.But whether you use PJT's metric or not,the load
balancer in such circumstances will behave the same, as you have rightly
pointed out: pull task1/2 to cpu2/3.

But the issue usually arises when all three wake up at the same time on
cpu2,portraying wrongly that the load is 3042, if PJT's metric is not
used.This could lead to load balancing one of these short running tasks
as shown by CASE1.This is the situation where in my opinion,PJT's metric
could make a difference.

Regards
Preeti U Murthy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/