Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp5104328ybf; Wed, 4 Mar 2020 17:26:03 -0800 (PST) X-Google-Smtp-Source: ADFU+vsvEFtEE7b25gAWFVtxY/1Nwf6e1Lv+rzrG7AVWKHROm7U3kx+vXsoofgyQ1c4n3cfqLsV0 X-Received: by 2002:aca:48b:: with SMTP id 133mr3896620oie.26.1583371562969; Wed, 04 Mar 2020 17:26:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583371562; cv=none; d=google.com; s=arc-20160816; b=0Rn3imiKQKDmmJB2Gt3IyFKYoYQMHRDVHO6HoDYi6e9mdwQ5dk/n9Vq7U3u4etrzlE YundFtH99kO15RfMQGmgOESV6HYX/wlTYLXG8L3SqMMgJdO6+75BcS+/YPvd7EEVNK1q lchpaSbW6friqMkmGiQFFx0Ufkp7gPzsyMkmttzPsxH2ZdumAkLpyOvcyq+8srt3q/Iq xzAeUDtpBmHZXrAke8o6/I7/z6Fj/AVJcgkAOx6sKgWj/MbvJ3/orhjXf6N5xjfAu50x P7bJJFYHRaUvREwvK6hi++2Y2tLGqj1Ul/5LWineZUFu+vOrxswU/G37MKVwJ/FZZUgK Q33g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=LXjZyEfD4MC31WFtelWS1VEr8DLfhXuE2jS+TEGcLVY=; b=eusbwZGolcVUrn2J9STZyODvRp5DUMl2JUT6djakv7haVCW00JNgx/PFB+4VAo7JV1 sFRPsJ7dQtFajC+96SN06UojSWzoKsN06628EzBG4v0iHxxlN1Ds7XkrRFb4zac8jccH l5tAp2kM7VyhecndyAgf+xXhK3moglYR7LrtxYkoaNzF36WmlSeXBU4Hs8JKy31WNPBi NMy3VSAYWXOSf2sXnnsZTB3gxqGy5TM+bGrx398HK7Y3zzQH4KNOvfmVl9sW5ZmjikYO AqUssqLaJhJa7Ueks2ToeHzOl9zwCSmqNJp6dB5aAa7FmbagqW/mwmh7Ys5GNU4qezdx qIDw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z10si2123380oic.77.2020.03.04.17.25.51; Wed, 04 Mar 2020 17:26:02 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725991AbgCEBX7 (ORCPT + 99 others); Wed, 4 Mar 2020 20:23:59 -0500 Received: from out30-42.freemail.mail.aliyun.com ([115.124.30.42]:49534 "EHLO out30-42.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725308AbgCEBX6 (ORCPT ); Wed, 4 Mar 2020 20:23:58 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R351e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=yun.wang@linux.alibaba.com;NM=1;PH=DS;RN=9;SR=0;TI=SMTPD_---0Trh0niI_1583371435; Received: from testdeMacBook-Pro.local(mailfrom:yun.wang@linux.alibaba.com fp:SMTPD_---0Trh0niI_1583371435) by smtp.aliyun-inc.com(127.0.0.1); Thu, 05 Mar 2020 09:23:55 +0800 Subject: Re: [RFC PATCH] sched: fix the nonsense shares when load of cfs_rq is too, small To: Vincent Guittot Cc: Peter Zijlstra , Ingo Molnar , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , "open list:SCHEDULER" References: <44fa1cee-08db-e4ab-e5ab-08d6fbd421d7@linux.alibaba.com> <20200303195245.GF2596@hirez.programming.kicks-ass.net> <241603dd-1149-58aa-85cf-43f3da2de43f@linux.alibaba.com> From: =?UTF-8?B?546L6LSH?= Message-ID: Date: Thu, 5 Mar 2020 09:23:55 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:68.0) Gecko/20100101 Thunderbird/68.4.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020/3/4 下午5:43, Vincent Guittot wrote: > On Wed, 4 Mar 2020 at 09:47, Vincent Guittot wrote: >> >> On Wed, 4 Mar 2020 at 02:19, 王贇 wrote: >>> >>> >>> >>> On 2020/3/4 上午3:52, Peter Zijlstra wrote: >>> [snip] >>>>> The reason is because we have group B with shares as 2, which make >>>>> the group A 'cfs_rq->load.weight' very small. >>>>> >>>>> And in calc_group_shares() we calculate shares as: >>>>> >>>>> load = max(scale_load_down(cfs_rq->load.weight), cfs_rq->avg.load_avg); >>>>> shares = (tg_shares * load) / tg_weight; >>>>> >>>>> Since the 'cfs_rq->load.weight' is too small, the load become 0 >>>>> in here, although 'tg_shares' is 102400, shares of the se which >>>>> stand for group A on root cfs_rq become 2. >>>> >>>> Argh, because A->cfs_rq.load.weight is B->se.load.weight which is >>>> B->shares/nr_cpus. >>> >>> Yeah, that's exactly why it happens, even the share 2 scale up to 2048, >>> on 96 CPUs platform, each CPU get only 21 in equal case. >>> >>>> >>>>> While the se of D on root cfs_rq is far more bigger than 2, so it >>>>> wins the battle. >>>>> >>>>> This patch add a check on the zero load and make it as MIN_SHARES >>>>> to fix the nonsense shares, after applied the group C wins as >>>>> expected. >>>>> >>>>> Signed-off-by: Michael Wang >>>>> --- >>>>> kernel/sched/fair.c | 2 ++ >>>>> 1 file changed, 2 insertions(+) >>>>> >>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>>>> index 84594f8aeaf8..53d705f75fa4 100644 >>>>> --- a/kernel/sched/fair.c >>>>> +++ b/kernel/sched/fair.c >>>>> @@ -3182,6 +3182,8 @@ static long calc_group_shares(struct cfs_rq *cfs_rq) >>>>> tg_shares = READ_ONCE(tg->shares); >>>>> >>>>> load = max(scale_load_down(cfs_rq->load.weight), cfs_rq->avg.load_avg); >>>>> + if (!load && cfs_rq->load.weight) >>>>> + load = MIN_SHARES; >>>>> >>>>> tg_weight = atomic_long_read(&tg->load_avg); >>>> >>>> Yeah, I suppose that'll do. Hurmph, wants a comment though. >>>> >>>> But that has me looking at other users of scale_load_down(), and doesn't >>>> at least update_tg_cfs_load() suffer the same problem? >>> >>> Good point :-) I'm not sure but is scale_load_down() supposed to scale small >>> value into 0? If not, maybe we should fix the helper to make sure it at >>> least return some real load? like: >>> >>> # define scale_load_down(w) ((w + (1 << SCHED_FIXEDPOINT_SHIFT)) >> SCHED_FIXEDPOINT_SHIFT) >> >> you will add +1 of nice prio for each device > > Of course, it's not prio but only weight which is different That's right, should only handle the issue cases. Regards, Michael Wang > >> >> should we use instead >> # define scale_load_down(w) ((w >> SCHED_FIXEDPOINT_SHIFT) ? (w >> >> SCHED_FIXEDPOINT_SHIFT) : MIN_SHARES) >> >> Regards, >> Vincent >> >>> >>> Regards, >>> Michael Wang >>> >>>>