Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp1614973ybt; Thu, 2 Jul 2020 09:30:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzZqsrgab9vUhMIetPyjswLG22wbuM10BCIiZNvvzoe2mWqdfYIVhYTezdXvItOplSmY3dl X-Received: by 2002:aa7:d88e:: with SMTP id u14mr19477874edq.11.1593707426818; Thu, 02 Jul 2020 09:30:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1593707426; cv=none; d=google.com; s=arc-20160816; b=wKJddvZ45v8p2aUgb2hflOs+IDgHhph8HtHXg4UxWwhy5mOXLD4sLGqH4oiCxZAk+w 8d+9676FtOAi2FkBdPX4+/SjoPCxHWnmajf+gZtALuDAXFUzCRYyFBbwF2196DNSbqlX ik761Lsd5oIATzWbZSx4jRbizfcVFP/ARTp05p4eft3qdB8dTvMBDWqwq6m8kIL4X2la RJ/s2YVjflSzWgzJifTi2bueMaP2jk2yn8P1glMeIKT8uitzPlWxnLClIYr3NWbjq4/7 lLBUwhf4yKCfltMMSm2v8Ed3TTNLsPA4SmwkXmwntmp6qIcTifRQSKaapcvt/cMjhxWU Sd1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=45YIoyetYX9rN/zEXlbrkZit74bHlRem2dm/OoOKAkg=; b=y18DHkv6Q2DGZR1IiRyFUZ1duoLHhGyolkdvjsx+x7ZUkfengcWPxG9hrkLgm6wEMC q4Kf/mViQja8mB9eFZeYgxT82mD6npT4/bQsd3pT50QS7i/rAXWNK6Y9zegSc105taO+ O3u53TTglznaPbSXevYP2d0/Nq3UhRmYY8NRq75xrkJXL1c6kLIW5xmEX5nm45bQNcAl Z2fff01YOZxTuHhMybTmYtItZYkmQABt3FSOVUTk8QHoc+WTFBCoMesJU3x/r8g1ngaz 8I3UajN4s7k2VlcDPQadDX50fm9cti++riBESJKAythQjP94oOL1DhR38pGU92UhKTT9 /7Ng== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=RnDIj271; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id du14si6973486ejc.721.2020.07.02.09.30.03; Thu, 02 Jul 2020 09:30:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=RnDIj271; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726874AbgGBQ27 (ORCPT + 99 others); Thu, 2 Jul 2020 12:28:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37912 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726379AbgGBQ27 (ORCPT ); Thu, 2 Jul 2020 12:28:59 -0400 Received: from mail-lj1-x241.google.com (mail-lj1-x241.google.com [IPv6:2a00:1450:4864:20::241]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 746AEC08C5C1 for ; Thu, 2 Jul 2020 09:28:58 -0700 (PDT) Received: by mail-lj1-x241.google.com with SMTP id s1so33033652ljo.0 for ; Thu, 02 Jul 2020 09:28:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=45YIoyetYX9rN/zEXlbrkZit74bHlRem2dm/OoOKAkg=; b=RnDIj271OIRNw6YzgJPVj3QiWMHXTdI7xC1ruWHXe+TGAm+Gq1PvIrH1VlFafz3l1v nhrBkBZzUQBotccrPRz/ryB7ozkm3OnNv1fow2CB285504/LCGzWAr6ZumEuf0PLyHIQ E2JZbcEThe90US6nxTe7p2SN0R6YwCOA3W36KRWxvb2JkmvWpOIJJgZp0ZtCVoxp+WXj fb2tcB6e7qp95tWmQKyDjQoJNgYGXn4Ge4WVL9/Lq+EIB14qQOzzZqqcCSiq6QvOjjAN 7BRPNvbKJ19H9V6HI3HT8A3fBmYcikRFbcU0seDcv/vT2lmqaRe644jFr840RIzdmAd2 xWHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=45YIoyetYX9rN/zEXlbrkZit74bHlRem2dm/OoOKAkg=; b=JeJUV0bcTgo5noduCQlGEivxNyGOtfAwLgsIbgS16+n1yHsDljFEvDFSYenh340oCE arru9A87XFq+7mLcGF8LJVRf7svnS5zCvydCyQUtkTnTH/w7PlwVIcmN19CpKUCp6xVm Kt60MyFDAyrUQmRO7Plq705+zfgT/lI6FSy8GPrRrbNJbSQyJPZSrBWKaD9utm7t0Zab /wP7Agv4tC8nl42M0LhOrUW73wrojYsmWOOikfm7BVM09GTS1VBKo1PQjEdqvaRNOfFS THYvUiX9C/aN2ryfEsJk1oFQuiKpKsyH8Tgn73Gp9eMeTyQNTuOauF9rPWnxmNr6ZVL6 p1CA== X-Gm-Message-State: AOAM530ngAejHbbuE645yQ9tKHcdqmJenuA3hVv92du/qvzYv+x53zzJ oVHc5xcjoupa33nC/fvo2/LpwTmAuDHZE2qTjjr+1w== X-Received: by 2002:a05:651c:512:: with SMTP id o18mr17929674ljp.226.1593707336733; Thu, 02 Jul 2020 09:28:56 -0700 (PDT) MIME-Version: 1.0 References: <20200702144258.19326-1-vincent.guittot@linaro.org> In-Reply-To: From: Vincent Guittot Date: Thu, 2 Jul 2020 18:28:45 +0200 Message-ID: Subject: Re: [PATCH] sched/fair: handle case of task_h_load() returning 0 To: Valentin Schneider Cc: Ingo Molnar , Peter Zijlstra , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , linux-kernel Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2 Jul 2020 at 18:11, Valentin Schneider wrote: > > > On 02/07/20 15:42, Vincent Guittot wrote: > > task_h_load() can return 0 in some situations like running stress-ng > > mmapfork, which forks thousands of threads, in a sched group on a 224 cores > > system. The load balance doesn't handle this correctly because > > env->imbalance never decreases and it will stop pulling tasks only after > > reaching loop_max, which can be equal to the number of running tasks of > > the cfs. Make sure that imbalance will be decreased by at least 1. > > > > misfit task is the other feature that doesn't handle correctly such > > situation although it's probably more difficult to face the problem > > because of the smaller number of CPUs and running tasks on heterogenous > > system. > > > > We can't simply ensure that task_h_load() returns at least one because it > > would imply to handle underrun in other places. > > Nasty one, that... > > Random thought: isn't that the kind of thing we have scale_load() and > scale_load_down() for? There's more uses of task_h_load() than I would like > for this, but if we upscale its output (or introduce an upscaled variant), > we could do something like: > > --- > detach_tasks() > { > long imbalance = env->imbalance; > > if (env->migration_type == migrate_load) > imbalance = scale_load(imbalance); > > while (!list_empty(tasks)) { > /* ... */ > switch (env->migration_type) { > case migrate_load: > load = task_h_load_upscaled(p); > /* ... usual bits here ...*/ > lsub_positive(&env->imbalance, load); > break; > /* ... */ > } > > if (!scale_load_down(env->imbalance)) > break; > } > } > --- > > It's not perfect, and there's still the misfit situation to sort out - > still, do you think this is something we could go towards? This will not work for 32bits system. For 64bits, I have to think a bit more if the upscale would fix all cases and support propagation across a hierarchy. And in this case we could also consider to make scale_load/scale_load_down a nop all the time > > > > > Signed-off-by: Vincent Guittot > > --- > > kernel/sched/fair.c | 18 +++++++++++++++++- > > 1 file changed, 17 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 6fab1d17c575..62747c24aa9e 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -4049,7 +4049,13 @@ static inline void update_misfit_status(struct task_struct *p, struct rq *rq) > > return; > > } > > > > - rq->misfit_task_load = task_h_load(p); > > + /* > > + * Make sure that misfit_task_load will not be null even if > > + * task_h_load() returns 0. misfit_task_load is only used to select > > + * rq with highest load so adding 1 will not modify the result > > + * of the comparison. > > + */ > > + rq->misfit_task_load = task_h_load(p) + 1; > > } > > > > #else /* CONFIG_SMP */ > > @@ -7664,6 +7670,16 @@ static int detach_tasks(struct lb_env *env) > > env->sd->nr_balance_failed <= env->sd->cache_nice_tries) > > goto next; > > > > + /* > > + * Depending of the number of CPUs and tasks and the > > + * cgroup hierarchy, task_h_load() can return a null > > + * value. Make sure that env->imbalance decreases > > + * otherwise detach_tasks() will stop only after > > + * detaching up to loop_max tasks. > > + */ > > + if (!load) > > + load = 1; > > + > > env->imbalance -= load; > > break;