Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757352AbcK3Nyb (ORCPT ); Wed, 30 Nov 2016 08:54:31 -0500 Received: from mail-wj0-f174.google.com ([209.85.210.174]:36112 "EHLO mail-wj0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753217AbcK3NyX (ORCPT ); Wed, 30 Nov 2016 08:54:23 -0500 MIME-Version: 1.0 In-Reply-To: References: <1480088073-11642-1-git-send-email-vincent.guittot@linaro.org> <1480088073-11642-3-git-send-email-vincent.guittot@linaro.org> <20161130124912.GD1716@e105550-lin.cambridge.arm.com> From: Vincent Guittot Date: Wed, 30 Nov 2016 14:54:00 +0100 Message-ID: Subject: Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group To: Morten Rasmussen Cc: Peter Zijlstra , Ingo Molnar , linux-kernel , Matt Fleming , Dietmar Eggemann , Wanpeng Li , Yuyang Du , Mike Galbraith Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4273 Lines: 93 On 30 November 2016 at 14:49, Vincent Guittot wrote: > On 30 November 2016 at 13:49, Morten Rasmussen wrote: >> On Fri, Nov 25, 2016 at 04:34:33PM +0100, Vincent Guittot wrote: >>> find_idlest_group() only compares the runnable_load_avg when looking for >>> the least loaded group. But on fork intensive use case like hackbench > > [snip] > >>> + min_avg_load = avg_load; >>> + idlest = group; >>> + } else if ((runnable_load < (min_runnable_load + imbalance)) && >>> + (100*min_avg_load > imbalance_scale*avg_load)) { >>> + /* >>> + * The runnable loads are close so we take >>> + * into account blocked load through avg_load >>> + * which is blocked + runnable load >>> + */ >>> + min_avg_load = avg_load; >>> idlest = group; >>> } >>> >>> @@ -5470,13 +5495,16 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, >>> goto no_spare; >>> >>> if (this_spare > task_util(p) / 2 && >>> - imbalance*this_spare > 100*most_spare) >>> + imbalance_scale*this_spare > 100*most_spare) >>> return NULL; >>> else if (most_spare > task_util(p) / 2) >>> return most_spare_sg; >>> >>> no_spare: >>> - if (!idlest || 100*this_load < imbalance*min_load) >>> + if (!idlest || >>> + (min_runnable_load > (this_runnable_load + imbalance)) || >>> + ((this_runnable_load < (min_runnable_load + imbalance)) && >>> + (100*min_avg_load > imbalance_scale*this_avg_load))) >> >> I don't get why you have imbalance_scale applied to this_avg_load and >> not min_avg_load. IIUC, you end up preferring non-local groups? > > In fact, I have keep the same condition that is used when looping the group. > You're right that we should prefer local rq if avg_load are close and > test the condition > (100*this_avg_load > imbalance_scale*min_avg_load) instead Of course the correct condition is (100*this_avg_load < imbalance_scale*min_avg_load) > >> >> If we take the example where this_runnable_load == min_runnable_load and >> this_avg_load == min_avg_load. In this case, and in cases where >> min_avg_load is slightly bigger than this_avg_load, we end up picking >> the 'idlest' group even if the local group is equally good or even >> slightly better? >> >>> return NULL; >>> return idlest; >>> } >> >> Overall, I like that load_avg is being brought in to make better >> decisions. The variable naming is a bit confusing. For example, >> runnable_load is a capacity-average just like avg_load. 'imbalance' is >> now an absolute capacity-average margin, but it is hard to come up with >> better short alternatives. >> >> Although 'imbalance' is based on the existing imbalance_pct, I find >> somewhat arbitrary. Why is (imbalance_pct-100)*1024/100 a good absolute >> margin to define the interval where we want to consider load_avg? I >> guess it is case of 'we had to pick some value', which we have done in >> many other places. Though, IMHO, it is a bit strange that imbalance_pct >> is used in two different ways to bias comparison in the same function. > > I see imbalance_pct like the definition of the acceptable imbalance % > for a sched_domain. This % is then used against the current load or to > define an absolute value. > >> It used to be only used as a scaling factor (now imbalance_scale), while >> this patch proposes to use it for computing an absolute margin >> (imbalance) as well. It is not major issue, but it is not clear why it >> is used differently to compare two metrics that are relatively closely >> related. > > In fact, scaling factor (imbalance) doesn't work well with small > value. As an example, the use of a scaling factor fails as soon as > this_runnable_load = 0 because we always selected local rq even if > min_runnable_load is only 1 which doesn't really make sense because > they are just the same. > >> >> Morten