Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp1425304ybt; Thu, 9 Jul 2020 06:52:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzfJLieKON2aQtF8+xbx13tQscYENtlCXWhddHEMhcjOEPjqM/s17Trsm9oIleO1MyO8kag X-Received: by 2002:aa7:c808:: with SMTP id a8mr71607847edt.259.1594302731593; Thu, 09 Jul 2020 06:52:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594302731; cv=none; d=google.com; s=arc-20160816; b=hv7u8W9xgk4WUZbeP1tcBh6Kn7zbJSKomc7vhxSjBQVhBSlXkpo9k7ipOOPCAUvHnC AZI7dOV0zk2gZb1Vmhr9kUECuvKYeDZ28gGuwwfZ8SYuyeloONqTs82qHRPt9hS9gDJ/ ccu5b6wMe1bLgD5B4sE58xkPflffJ2QK6IL2lkQ+R5D3ZHJodpJPoP4b6ytK1ANqFZ4J 5ZKuh7glhN9+gJaNl2YZ9yjJNSNgGoU2UlYFNkp11WeYQbY84c9ijweX1AUhMZxU3mRR IATqIISAqGZ08u+acGH9mTFOCyEmpRZfetxLB4di2aU/LPH4vSu4dOrL+urhBmgj5ZFe wjew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=OMmGFiM5wZvylNqtr2CfBk2MvkCWGsBEORBKOC6AJtU=; b=058/t0iehHnsuVeB/YkKBbeX0YQ8kB+7bZdgu8+cCmpkbLAml9ZYOdLfz1uHkJwW+l zVbAp/k2gsFMKda+5phV2NE3kQSAkNdTkGkd9gGJ6bcZry8VryYXgavoD1cBgGl7U5Vi gjfXy1oy/CF/PSOy/Gn7HMbuFW/7d1c5zsgEwObaLIToHDDaO20v63mP8wsNcD5IoB4f 3VwR0t7vxtPbJjDfKKkzHaGGLKbq7OlXY5R6G17s5Oe46yRo7OiT6lL0aoGBDMG4HJXz c0VNVnRsVfKVA5D4JgIK2UUJ3rxfjDos7gJz8ffMsIM14ZFj42Zz1UBJhhtiAm/Dzeyw 0rJA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=WYIv4EPD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ca17si2020842ejb.533.2020.07.09.06.51.48; Thu, 09 Jul 2020 06:52:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=WYIv4EPD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726859AbgGINvZ (ORCPT + 99 others); Thu, 9 Jul 2020 09:51:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726358AbgGINvZ (ORCPT ); Thu, 9 Jul 2020 09:51:25 -0400 Received: from mail-lj1-x241.google.com (mail-lj1-x241.google.com [IPv6:2a00:1450:4864:20::241]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4716C08C5CE for ; Thu, 9 Jul 2020 06:51:24 -0700 (PDT) Received: by mail-lj1-x241.google.com with SMTP id q7so2502499ljm.1 for ; Thu, 09 Jul 2020 06:51:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=OMmGFiM5wZvylNqtr2CfBk2MvkCWGsBEORBKOC6AJtU=; b=WYIv4EPDk1sP0EDqyQ6yQhaF0ahSuSWPYVys127cyyrQRYH17mGNM9KL6YkWIIlpgl gfOLNRCNfRD590jGetuk3vgAAueI8stSPXGcIbK81zTojc+zTrp5eEYira+35tlTIjlB Yz4T1SFCv/f+uAmJgOr828J0/RFXZn7zNISwdpkPDHd/n/ALW+aDd6M3isb9vCniUyDQ oDhWwvrDOUA/HanGmAh+8LKYVSRqIor/dTjlBF1QlVeQRiQTfWIl74BQ066jhoowpb1A ntJ/mVxyld/xjMf8190eZdAnbapZ85dFFxIFxUyWV/3EKOx2i5urSmhp2ycfP7ldJaIH p+Yg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=OMmGFiM5wZvylNqtr2CfBk2MvkCWGsBEORBKOC6AJtU=; b=FgEdv48HdQRSGyUePTimi02NgivaGTLUEYzYO6UeQZ8cX9SKSC2SHZB/o9SbkNpCaI Z6PAQA9s0f+eU+5Bw41+HfSysnw+7QrrX36t6qhQ1/20jjPBNSAVQ6+gxFlypcoj0GlD xp1DOmZR+gnPPG2QE22U8WiZk7+H25QEDw192552omyShANaay9G/HTZDSMIkCnVHxpZ F2OcwOJdp4VSKTjHleHB+bDYGd/8zLTEqY3yRvZmllTgr4yivnI7ebA+tvEjkxevdKd6 UL7XhD+K44tAzEB6gMgi/ANdymVWvrQKx2OTh8+4q7swHlfRMPQHFaFLI/gXGsIUcKCE 248A== X-Gm-Message-State: AOAM531t1TjErLz88BAzjQ1mcrxSQZ8Qcl7Bx3rABamTkMc+P79vvYg2 j0lJ2InsUcpQ7+jIA7Z6Xt6lT1Pov7ga31kf4H6ufFTL X-Received: by 2002:a2e:8059:: with SMTP id p25mr20507176ljg.156.1594302683087; Thu, 09 Jul 2020 06:51:23 -0700 (PDT) MIME-Version: 1.0 References: <20200702144258.19326-1-vincent.guittot@linaro.org> In-Reply-To: From: Vincent Guittot Date: Thu, 9 Jul 2020 15:51:11 +0200 Message-ID: Subject: Re: [PATCH] sched/fair: handle case of task_h_load() returning 0 To: Valentin Schneider Cc: Ingo Molnar , Peter Zijlstra , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , linux-kernel Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 9 Jul 2020 at 15:06, Valentin Schneider wrote: > > > On 02/07/20 15:42, Vincent Guittot wrote: > > task_h_load() can return 0 in some situations like running stress-ng > > mmapfork, which forks thousands of threads, in a sched group on a 224 cores > > system. The load balance doesn't handle this correctly because > > env->imbalance never decreases and it will stop pulling tasks only after > > reaching loop_max, which can be equal to the number of running tasks of > > the cfs. Make sure that imbalance will be decreased by at least 1. > > > > misfit task is the other feature that doesn't handle correctly such > > situation although it's probably more difficult to face the problem > > because of the smaller number of CPUs and running tasks on heterogenous > > system. > > > > We can't simply ensure that task_h_load() returns at least one because it > > would imply to handle underrun in other places. > > > > Signed-off-by: Vincent Guittot > > I dug some more into this; if I got my math right, this can be reproduced > with a single task group below the root. Forked tasks get max load, so this > can be tried out with either tons of forks or tons of CPU hogs. > > We need > > p->se.avg.load_avg * cfs_rq->h_load > ----------------------------------- < 1 > cfs_rq_load_avg(cfs_rq) + 1 > > Assuming homogeneous system with tasks spread out all over (no other tasks > interfering), that should boil down to > > 1024 * (tg.shares / nr_cpus) > --------------------------- < 1 > 1024 * (nr_tasks_on_cpu) > > IOW > > tg.shares / nr_cpus < nr_tasks_on_cpu > > If we get tasks nicely spread out, a simple condition to hit this should be > to have more tasks than shares. > > I can hit task_h_load=0 with the following on my Juno (pinned to one CPU to > make things simpler; big.LITTLE doesn't yield equal weights between CPUs): > > cgcreate -g cpu:tg0 > > echo 128 > /sys/fs/cgroup/cpu/tg0/cpu.shares > > for ((i=0; i<130; i++)); do > # busy loop of your choice > taskset -c 0 ./loop.sh & > echo $! > /sys/fs/cgroup/cpu/tg0/tasks > done > > > --- > > kernel/sched/fair.c | 18 +++++++++++++++++- > > 1 file changed, 17 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 6fab1d17c575..62747c24aa9e 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -4049,7 +4049,13 @@ static inline void update_misfit_status(struct task_struct *p, struct rq *rq) > > return; > > } > > > > - rq->misfit_task_load = task_h_load(p); > > + /* > > + * Make sure that misfit_task_load will not be null even if > > + * task_h_load() returns 0. misfit_task_load is only used to select > > + * rq with highest load so adding 1 will not modify the result > > + * of the comparison. > > + */ > > + rq->misfit_task_load = task_h_load(p) + 1; > > For here and below; wouldn't it be a tad cleaner to just do > > foo = max(task_h_load(p), 1); +1 For the one below, my goal was mainly to not impact the result of the tests before applying the +1 but doing it before will not change the results I'm going to update it > > Otherwise, I think I've properly convinced myself we do want to have > that in one form or another. So either way: > > Reviewed-by: Valentin Schneider Thanks > > > } > > > > #else /* CONFIG_SMP */ > > @@ -7664,6 +7670,16 @@ static int detach_tasks(struct lb_env *env) > > env->sd->nr_balance_failed <= env->sd->cache_nice_tries) > > goto next; > > > > + /* > > + * Depending of the number of CPUs and tasks and the > > + * cgroup hierarchy, task_h_load() can return a null > > + * value. Make sure that env->imbalance decreases > > + * otherwise detach_tasks() will stop only after > > + * detaching up to loop_max tasks. > > + */ > > + if (!load) > > + load = 1; > > + > > env->imbalance -= load; > > break;