Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp1394452ybt; Thu, 9 Jul 2020 06:07:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxbIJFvulXe+sHDJ5aOwNvKqSpgEzvaCNUtcVYsQpxnAViSAGc9mWsORebBOlCu3dwDD9rw X-Received: by 2002:a17:906:7743:: with SMTP id o3mr6805558ejn.224.1594300078458; Thu, 09 Jul 2020 06:07:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594300078; cv=none; d=google.com; s=arc-20160816; b=h/iqN+NgBSskCq72frbRXS17C7zJdqJPjAU9XbRdecXgZSp+udCF0in/Chcg+CNVlP fqaHL750x0UrT2UDuOT3Q5z6/8PrZJ8MKAW5R6lTM/Qdhaqks5STG7p3jToaix41yN/h EeTS2Z4O+ztsRiQPDWPAXulDeW0si5M3q/vc8ybveWV0rU5rfmy4heE1SjyMUugspYa6 Dk4dv53YItmC9DySCojzPLtBxMkfRmN4gREOSzyL/Uls4cS0jKpykon7iSCrKaVeOLZu u8bKQXubNAlp62uf1TIzOfEgT2lnodv3UTQUjw6+c+y/v+ws9N6yDIHuVUB/nVG8BlQd BQgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:in-reply-to :subject:cc:to:from:user-agent:references; bh=QMnHp6g1NplpzQeiDhir4B72pYZPsffmopka5OByLh4=; b=EeIj8hGWWNpwKPvmkiBXR7V0zDPjQEPwgNwCqnasoO5lztJQkN2q47F2mn/zdl8Wll /uH6moneA746UaHtwG9PPjdo7s5B+8mZfWfBxZEol2LxETsMMSKsjuEnC/ErAuUrDovb RHbNv8/N7NdK+5HnoBzow7bHtJ3/c/keFGYvXweainXmC/CGuLRs4mh+mYRmIkFj7mmS rBeS3OUJDcM3bDyF0T0YovSIBD1F1sqb3YBb2j2Rg+95S/LFNfX07fXNKseOxbP/su3/ KrWXi9aJO9puvJWGzLECXepJ30jFCAqQOfRWBy6IEZYctW1+xfurMkJ9K25YMJ5SStj5 Jc5g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id de19si2049570edb.285.2020.07.09.06.07.33; Thu, 09 Jul 2020 06:07:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726517AbgGINGo (ORCPT + 99 others); Thu, 9 Jul 2020 09:06:44 -0400 Received: from foss.arm.com ([217.140.110.172]:57408 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726347AbgGINGn (ORCPT ); Thu, 9 Jul 2020 09:06:43 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id ED4C01FB; Thu, 9 Jul 2020 06:06:42 -0700 (PDT) Received: from e113632-lin (e113632-lin.cambridge.arm.com [10.1.194.46]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B6F983FA00; Thu, 9 Jul 2020 06:06:41 -0700 (PDT) References: <20200702144258.19326-1-vincent.guittot@linaro.org> User-agent: mu4e 0.9.17; emacs 26.3 From: Valentin Schneider To: Vincent Guittot Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, linux-kernel@vger.kernel.org Subject: Re: [PATCH] sched/fair: handle case of task_h_load() returning 0 In-reply-to: <20200702144258.19326-1-vincent.guittot@linaro.org> Date: Thu, 09 Jul 2020 14:06:35 +0100 Message-ID: MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/07/20 15:42, Vincent Guittot wrote: > task_h_load() can return 0 in some situations like running stress-ng > mmapfork, which forks thousands of threads, in a sched group on a 224 cores > system. The load balance doesn't handle this correctly because > env->imbalance never decreases and it will stop pulling tasks only after > reaching loop_max, which can be equal to the number of running tasks of > the cfs. Make sure that imbalance will be decreased by at least 1. > > misfit task is the other feature that doesn't handle correctly such > situation although it's probably more difficult to face the problem > because of the smaller number of CPUs and running tasks on heterogenous > system. > > We can't simply ensure that task_h_load() returns at least one because it > would imply to handle underrun in other places. > > Signed-off-by: Vincent Guittot I dug some more into this; if I got my math right, this can be reproduced with a single task group below the root. Forked tasks get max load, so this can be tried out with either tons of forks or tons of CPU hogs. We need p->se.avg.load_avg * cfs_rq->h_load ----------------------------------- < 1 cfs_rq_load_avg(cfs_rq) + 1 Assuming homogeneous system with tasks spread out all over (no other tasks interfering), that should boil down to 1024 * (tg.shares / nr_cpus) --------------------------- < 1 1024 * (nr_tasks_on_cpu) IOW tg.shares / nr_cpus < nr_tasks_on_cpu If we get tasks nicely spread out, a simple condition to hit this should be to have more tasks than shares. I can hit task_h_load=0 with the following on my Juno (pinned to one CPU to make things simpler; big.LITTLE doesn't yield equal weights between CPUs): cgcreate -g cpu:tg0 echo 128 > /sys/fs/cgroup/cpu/tg0/cpu.shares for ((i=0; i<130; i++)); do # busy loop of your choice taskset -c 0 ./loop.sh & echo $! > /sys/fs/cgroup/cpu/tg0/tasks done > --- > kernel/sched/fair.c | 18 +++++++++++++++++- > 1 file changed, 17 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 6fab1d17c575..62747c24aa9e 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -4049,7 +4049,13 @@ static inline void update_misfit_status(struct task_struct *p, struct rq *rq) > return; > } > > - rq->misfit_task_load = task_h_load(p); > + /* > + * Make sure that misfit_task_load will not be null even if > + * task_h_load() returns 0. misfit_task_load is only used to select > + * rq with highest load so adding 1 will not modify the result > + * of the comparison. > + */ > + rq->misfit_task_load = task_h_load(p) + 1; For here and below; wouldn't it be a tad cleaner to just do foo = max(task_h_load(p), 1); Otherwise, I think I've properly convinced myself we do want to have that in one form or another. So either way: Reviewed-by: Valentin Schneider > } > > #else /* CONFIG_SMP */ > @@ -7664,6 +7670,16 @@ static int detach_tasks(struct lb_env *env) > env->sd->nr_balance_failed <= env->sd->cache_nice_tries) > goto next; > > + /* > + * Depending of the number of CPUs and tasks and the > + * cgroup hierarchy, task_h_load() can return a null > + * value. Make sure that env->imbalance decreases > + * otherwise detach_tasks() will stop only after > + * detaching up to loop_max tasks. > + */ > + if (!load) > + load = 1; > + > env->imbalance -= load; > break;