Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp113137ybh; Tue, 17 Mar 2020 19:16:15 -0700 (PDT) X-Google-Smtp-Source: ADFU+vtAkutuBYY9VqQNjJHk/tICP7xuTyQ4hCjsHx17Kg3FTjQiqn1x3rX+3EW8FsKa5tgFB1XJ X-Received: by 2002:a9d:23a1:: with SMTP id t30mr1948714otb.253.1584497775007; Tue, 17 Mar 2020 19:16:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584497774; cv=none; d=google.com; s=arc-20160816; b=l2zzHMZrHxUHa9xQIi8n6QL95dYQtpmK/q/FnS75itgTVcaLfvVgd5NCnHAMHW+mpa 33D3SBtKQUSlIW2Nd9YOsNooYhtDFb6K8jVbblAQ4dZ2LYhdir/TX8j6xhWHBfSaCE/Q 4DLWQfRn0bYxokRoGXHPHYeJoTyvmeQYWdaglnbXsy2CC6fdhiKHVL+Avb36/NZswV0t 7d/Owoxbm0mQGwsP7dd4XYltxLMCmBeBz7zMcKPCLWd7ToaXMJm0kY5nUQ5q7fccILEd dXlLhSrwyDGz8djorjSqDlDPIVJhp8p3S9yhhsZc3u+g2ps8TQv4EQUs2y3QZ/nbDsiS U5Dg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:mime-version:user-agent:date:message-id:subject :from:to; bh=KKBIMwNyLFk1eYhdCJP31fv+fLu3lMnTlU3bgAvEpCE=; b=0aDQsaOgdhuYW37wBzI/p8UM5LPz5hSODXQiQpHBhTRt6m9jMNS46zZMT0hmppJeq7 Ok/AUL5UzMZxI87YtIqoLlovAu1TAl9/X3u4SapoiOCD1fIDxQ1CQn3qa9BtSbtr/3Or UpDxjntj2l7bBJ8Qv3laYz8b3RXQtShGT9/BNLt4CcJOQImb4rTTDQIxC3ELmoexT4Uf BdCH842ucYD4sNq3B+DtePi2iizqyxYOeDfopwDZzRWULei5xzG+82kBAxh+sdqhHP4p scYbJsD9tRyh5sBRugTYAFvbJdHo/AFMECkAuOQ4Uo6UgUAUMK/6b0CI/9mIvTRWyIK7 diww== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h23si2906460otk.315.2020.03.17.19.16.02; Tue, 17 Mar 2020 19:16:14 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727283AbgCRCPU (ORCPT + 99 others); Tue, 17 Mar 2020 22:15:20 -0400 Received: from out4436.biz.mail.alibaba.com ([47.88.44.36]:14807 "EHLO out4436.biz.mail.alibaba.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726229AbgCRCPT (ORCPT ); Tue, 17 Mar 2020 22:15:19 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R121e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e07488;MF=yun.wang@linux.alibaba.com;NM=1;PH=DS;RN=9;SR=0;TI=SMTPD_---0Tsv6iJu_1584497715; Received: from testdeMacBook-Pro.local(mailfrom:yun.wang@linux.alibaba.com fp:SMTPD_---0Tsv6iJu_1584497715) by smtp.aliyun-inc.com(127.0.0.1); Wed, 18 Mar 2020 10:15:16 +0800 To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , "open list:SCHEDULER" From: =?UTF-8?B?546L6LSH?= Subject: [PATCH v2] sched: avoid scale real weight down to zero Message-ID: <38e8e212-59a1-64b2-b247-b6d0b52d8dc1@linux.alibaba.com> Date: Wed, 18 Mar 2020 10:15:15 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:68.0) Gecko/20100101 Thunderbird/68.4.2 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org During our testing, we found a case that shares no longer working correctly, the cgroup topology is like: /sys/fs/cgroup/cpu/A (shares=102400) /sys/fs/cgroup/cpu/A/B (shares=2) /sys/fs/cgroup/cpu/A/B/C (shares=1024) /sys/fs/cgroup/cpu/D (shares=1024) /sys/fs/cgroup/cpu/D/E (shares=1024) /sys/fs/cgroup/cpu/D/E/F (shares=1024) The same benchmark is running in group C & F, no other tasks are running, the benchmark is capable to consumed all the CPUs. We suppose the group C will win more CPU resources since it could enjoy all the shares of group A, but it's F who wins much more. The reason is because we have group B with shares as 2, since A->cfs_rq.load.weight == B->se.load.weight == B->shares/nr_cpus, so A->cfs_rq.load.weight become very small. And in calc_group_shares() we calculate shares as: load = max(scale_load_down(cfs_rq->load.weight), cfs_rq->avg.load_avg); shares = (tg_shares * load) / tg_weight; Since the 'cfs_rq->load.weight' is too small, the load become 0 after scale down, although 'tg_shares' is 102400, shares of the se which stand for group A on root cfs_rq become 2. While the se of D on root cfs_rq is far more bigger than 2, so it wins the battle. Thus when scale_load_down() scale real weight down to 0, it's no longer telling the real story, the caller will have the wrong information and the calculation will be buggy. This patch add check in scale_load_down(), so the real weight will be >= MIN_SHARES after scale, after applied the group C wins as expected. Cc: Ben Segall Reviewed-by: Vincent Guittot Suggested-by: Peter Zijlstra Signed-off-by: Michael Wang --- v2: * replace MIN_SHARE with 2UL to cover CONFIG_FAIR_GROUP_SCHED=n case kernel/sched/sched.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 2a0caf394dd4..9bca26bd60d9 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -118,7 +118,13 @@ extern long calc_load_fold_active(struct rq *this_rq, long adjust); #ifdef CONFIG_64BIT # define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT + SCHED_FIXEDPOINT_SHIFT) # define scale_load(w) ((w) << SCHED_FIXEDPOINT_SHIFT) -# define scale_load_down(w) ((w) >> SCHED_FIXEDPOINT_SHIFT) +# define scale_load_down(w) \ +({ \ + unsigned long __w = (w); \ + if (__w) \ + __w = max(2UL, __w >> SCHED_FIXEDPOINT_SHIFT); \ + __w; \ +}) #else # define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT) # define scale_load(w) (w) -- 2.14.4.44.g2045bb6