Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp5175754ybf; Wed, 4 Mar 2020 18:58:10 -0800 (PST) X-Google-Smtp-Source: ADFU+vtTgCNGAYsjEpzc8Bz93HkcDpEXPCJtH5NW4ZQI9vy4GOrHFmqfcUB7g+Q/Lp+ME0gtUwxW X-Received: by 2002:a9d:443:: with SMTP id 61mr4452905otc.357.1583377090409; Wed, 04 Mar 2020 18:58:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583377090; cv=none; d=google.com; s=arc-20160816; b=gTpb18mDldSZ8d5Iol8Dnbu9In97ttsrmZWH9zFaFxvBra4KFWBTyGX+/gjTsZyE8t EiiiYMR7ERp3eCGvxTWasaVGXxO3KitCMoRR4xHr0EZRYVqgHzpzTaga2FoskVOLaoC8 Xe8npljdPFSkpwAJWVn44m7fh8J7gWN43m48DLRYpphluI3na3xSbwUM09jbvWUAK4AL mlA4iavW3pNQZGjP9TZ4vBwgsTadvtYJGev172VJljCkiE3WVXIx6WbTq5pyYlsP6fDS cKmjk6GpMnDd3oSLov558dI736hf79OsLg4EEYkOzlzNPR/LFbJXrkRKQEZv/DSerUA9 amCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:mime-version:user-agent:date:message-id:subject :from:to; bh=M+x1lCmlJabktTFg97KuD+Tharcjgogdoc/6kETroHI=; b=ir/vVdNmCa7jSjX5HkewhOiwwB8ao/h/XPG1AuYXKXmKUeuQ3nmlikebOULG4F+X7I qQwZZrdrseuz/pV880y2iCNPaQ6dKmfrtJ2Oi8DAQWNBcIBtOcrShS6r90jKVXIKic+9 I3c/S4IkaOlS+iXYYstdgOgiUANwKK4qIMQWTq9RUACWLwPlrpI9EuPVeZxO4dOoxbWU Ca2IjbLO+/YaA/slm/e5hj2bxi/ks/bAXqKt+5d7AI2CEpjAHXyyoNznORj2dRyc6oF9 umT94BE4oUM2TQMEod/ianRdCi2/TIQHvh72lFTLnF/nU0VuT7tYQ8Q4e87OPDdA6QWe 1pVQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k8si1634295oib.47.2020.03.04.18.57.56; Wed, 04 Mar 2020 18:58:10 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725830AbgCEC5K (ORCPT + 99 others); Wed, 4 Mar 2020 21:57:10 -0500 Received: from out30-131.freemail.mail.aliyun.com ([115.124.30.131]:46901 "EHLO out30-131.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725776AbgCEC5K (ORCPT ); Wed, 4 Mar 2020 21:57:10 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R581e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04420;MF=yun.wang@linux.alibaba.com;NM=1;PH=DS;RN=9;SR=0;TI=SMTPD_---0TrhOBzC_1583377024; Received: from testdeMacBook-Pro.local(mailfrom:yun.wang@linux.alibaba.com fp:SMTPD_---0TrhOBzC_1583377024) by smtp.aliyun-inc.com(127.0.0.1); Thu, 05 Mar 2020 10:57:04 +0800 To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , "open list:SCHEDULER" , Vincent Guittot From: =?UTF-8?B?546L6LSH?= Subject: [PATCH] sched: avoid scale real weight down to zero Message-ID: Date: Thu, 5 Mar 2020 10:57:03 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:68.0) Gecko/20100101 Thunderbird/68.4.2 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org During our testing, we found a case that shares no longer working correctly, the cgroup topology is like: /sys/fs/cgroup/cpu/A (shares=102400) /sys/fs/cgroup/cpu/A/B (shares=2) /sys/fs/cgroup/cpu/A/B/C (shares=1024) /sys/fs/cgroup/cpu/D (shares=1024) /sys/fs/cgroup/cpu/D/E (shares=1024) /sys/fs/cgroup/cpu/D/E/F (shares=1024) The same benchmark is running in group C & F, no other tasks are running, the benchmark is capable to consumed all the CPUs. We suppose the group C will win more CPU resources since it could enjoy all the shares of group A, but it's F who wins much more. The reason is because we have group B with shares as 2, since A->cfs_rq.load.weight == B->se.load.weight == B->shares/nr_cpus, so A->cfs_rq.load.weight become very small. And in calc_group_shares() we calculate shares as: load = max(scale_load_down(cfs_rq->load.weight), cfs_rq->avg.load_avg); shares = (tg_shares * load) / tg_weight; Since the 'cfs_rq->load.weight' is too small, the load become 0 after scale down, although 'tg_shares' is 102400, shares of the se which stand for group A on root cfs_rq become 2. While the se of D on root cfs_rq is far more bigger than 2, so it wins the battle. Thus when scale_load_down() scale real weight down to 0, it's no longer telling the real story, the caller will have the wrong information and the calculation will be buggy. This patch add check in scale_load_down(), so the real weight will be >= MIN_SHARES after scale, after applied the group C wins as expected. Cc: Ben Segall Cc: Vincent Guittot Suggested-by: Peter Zijlstra Signed-off-by: Michael Wang --- kernel/sched/sched.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 2a0caf394dd4..75c283f22256 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -118,7 +118,13 @@ extern long calc_load_fold_active(struct rq *this_rq, long adjust); #ifdef CONFIG_64BIT # define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT + SCHED_FIXEDPOINT_SHIFT) # define scale_load(w) ((w) << SCHED_FIXEDPOINT_SHIFT) -# define scale_load_down(w) ((w) >> SCHED_FIXEDPOINT_SHIFT) +# define scale_load_down(w) \ +({ \ + unsigned long __w = (w); \ + if (__w) \ + __w = max(MIN_SHARES, __w >> SCHED_FIXEDPOINT_SHIFT); \ + __w; \ +}) #else # define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT) # define scale_load(w) (w) -- 2.14.4.44.g2045bb6