MIME-Version: 1.0
In-Reply-To: <CAKfTPtBMeL9Omvj+KzL2KAhTH8rjz5BzPypaj6DmCXn0ykZpWg@mail.gmail.com>
References: <1480088073-11642-1-git-send-email-vincent.guittot@linaro.org>
 <1480088073-11642-3-git-send-email-vincent.guittot@linaro.org>
 <20161130124912.GD1716@e105550-lin.cambridge.arm.com> <CAKfTPtBMeL9Omvj+KzL2KAhTH8rjz5BzPypaj6DmCXn0ykZpWg@mail.gmail.com>
From: Vincent Guittot <vincent.guittot@linaro.org>
Date: Wed, 30 Nov 2016 14:54:00 +0100
Message-ID: <CAKfTPtAavCw+bYXOd1f6V7SfWjn3xSJNVcR3Za9eOMP_GFHd3w@mail.gmail.com>
Subject: Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group
To: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>,
        linux-kernel <linux-kernel@vger.kernel.org>,
        Matt Fleming <matt@codeblueprint.co.uk>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Wanpeng Li <kernellwp@gmail.com>, Yuyang Du <yuyang.du@intel.com>,
        Mike Galbraith <umgwanakikbuti@gmail.com>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4273
Lines: 93

On 30 November 2016 at 14:49, Vincent Guittot
<vincent.guittot@linaro.org> wrote:
> On 30 November 2016 at 13:49, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
>> On Fri, Nov 25, 2016 at 04:34:33PM +0100, Vincent Guittot wrote:
>>> find_idlest_group() only compares the runnable_load_avg when looking for
>>> the least loaded group. But on fork intensive use case like hackbench
>
> [snip]
>
>>> +                             min_avg_load = avg_load;
>>> +                             idlest = group;
>>> +                     } else if ((runnable_load < (min_runnable_load + imbalance)) &&
>>> +                                     (100*min_avg_load > imbalance_scale*avg_load)) {
>>> +                             /*
>>> +                              * The runnable loads are close so we take
>>> +                              * into account blocked load through avg_load
>>> +                              *  which is blocked + runnable load
>>> +                              */
>>> +                             min_avg_load = avg_load;
>>>                               idlest = group;
>>>                       }
>>>
>>> @@ -5470,13 +5495,16 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p,
>>>               goto no_spare;
>>>
>>>       if (this_spare > task_util(p) / 2 &&
>>> -         imbalance*this_spare > 100*most_spare)
>>> +         imbalance_scale*this_spare > 100*most_spare)
>>>               return NULL;
>>>       else if (most_spare > task_util(p) / 2)
>>>               return most_spare_sg;
>>>
>>>  no_spare:
>>> -     if (!idlest || 100*this_load < imbalance*min_load)
>>> +     if (!idlest ||
>>> +         (min_runnable_load > (this_runnable_load + imbalance)) ||
>>> +         ((this_runnable_load < (min_runnable_load + imbalance)) &&
>>> +                     (100*min_avg_load > imbalance_scale*this_avg_load)))
>>
>> I don't get why you have imbalance_scale applied to this_avg_load and
>> not min_avg_load. IIUC, you end up preferring non-local groups?
>
> In fact, I have keep the same condition that is used when looping the group.
> You're right that we should prefer local rq if avg_load are close and
> test the condition
> (100*this_avg_load > imbalance_scale*min_avg_load) instead

Of course the correct condition is
 (100*this_avg_load < imbalance_scale*min_avg_load)

>
>>
>> If we take the example where this_runnable_load == min_runnable_load and
>> this_avg_load == min_avg_load. In this case, and in cases where
>> min_avg_load is slightly bigger than this_avg_load, we end up picking
>> the 'idlest' group even if the local group is equally good or even
>> slightly better?
>>
>>>               return NULL;
>>>       return idlest;
>>>  }
>>
>> Overall, I like that load_avg is being brought in to make better
>> decisions. The variable naming is a bit confusing. For example,
>> runnable_load is a capacity-average just like avg_load. 'imbalance' is
>> now an absolute capacity-average margin, but it is hard to come up with
>> better short alternatives.
>>
>> Although 'imbalance' is based on the existing imbalance_pct, I find
>> somewhat arbitrary. Why is (imbalance_pct-100)*1024/100 a good absolute
>> margin to define the interval where we want to consider load_avg? I
>> guess it is case of 'we had to pick some value', which we have done in
>> many other places. Though, IMHO, it is a bit strange that imbalance_pct
>> is used in two different ways to bias comparison in the same function.
>
> I see imbalance_pct like the definition of the acceptable imbalance %
> for a sched_domain. This % is then used against the current load or to
> define an absolute value.
>
>> It used to be only used as a scaling factor (now imbalance_scale), while
>> this patch proposes to use it for computing an absolute margin
>> (imbalance) as well. It is not major issue, but it is not clear why it
>> is used differently to compare two metrics that are relatively closely
>> related.
>
> In fact, scaling factor (imbalance) doesn't work well with small
> value. As an example, the use of a scaling factor fails as soon as
> this_runnable_load = 0 because we always selected local rq even if
> min_runnable_load is only 1  which doesn't really make sense because
> they are just the same.
>
>>
>> Morten