Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp1530420ybt; Thu, 2 Jul 2020 07:44:35 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxJrlWnRY28xa3Q8rN6MQs1yZM29HBuisTEPz3InIfrQHOe/5KD+NQ6ITDt6fERKiWR7IKN X-Received: by 2002:a17:906:2296:: with SMTP id p22mr27224695eja.510.1593701075841; Thu, 02 Jul 2020 07:44:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1593701075; cv=none; d=google.com; s=arc-20160816; b=tNK8Jd+N8ykt0J+lKVh6uYqcOpLvnT1hITGuR6SW1atWhlQNL7kC+25lSHCCxlO0CH t2JFkzps584dNYuHP9f9xX30OSuErrwTD+e6oYIsfcwW7WS+ucQOlNUo2PYkDEKEzKop RI20SW3kwYga2n2KQ6i0wsn7nkFuwAn6K0+7NFpvE/tA25uL/S3iMF1ldL41FTfibcOM +4sVHpMFFWQwhh/BihuQi4ZXZJhz1wdT4zu+vs2DWOTkkItwz8/FRBUrTZbyoLJ5u8u2 mmisz+GoLjbBEIyeEg4daoWrnVqQGRS3uUEGiMlPux7Xfa3aMVabjxt7VWRyidlDEbNU w52w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature; bh=Wr6vKnMO3nR7PwYZXjk1oOIxaUj/saExnI4Slj9N4J4=; b=v73aJWT9AcYHNKEY3qfT/DJhy+N/4Ehn01o8oBGfSmVsgBgtQc/UhLokvGg0xpcahA U4cPEGFdfFVMZFK9MswtLAf6+QpkXZ1TwGq9TI6+JQBfX9KAlX2tB7Y+XLRjgRSvWKCw 07TjDru/YfV71g4InrOvmF0GAsFSBSGnz5yDuMRhUyKJG/NkHL0c72FSanrWYmFZpdGp a8xxPemyUzl2IKJx2N3wGhaa9VObzP5Jkac63XKuSuUZ2POjqiA2GuBYKNeJZqo2IP6T FhmC4Hag8NR3L2Xp5/xT5jA8I6q7WTlchn1nzzRFPyg2iVvSJ3NBL3xbJcc8hWnB8Bzt lEAg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=l97a+cx0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g3si5588648eje.290.2020.07.02.07.44.12; Thu, 02 Jul 2020 07:44:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=l97a+cx0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729630AbgGBOoB (ORCPT + 99 others); Thu, 2 Jul 2020 10:44:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49736 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729047AbgGBOoB (ORCPT ); Thu, 2 Jul 2020 10:44:01 -0400 Received: from mail-wr1-x441.google.com (mail-wr1-x441.google.com [IPv6:2a00:1450:4864:20::441]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A7965C08C5C1 for ; Thu, 2 Jul 2020 07:44:00 -0700 (PDT) Received: by mail-wr1-x441.google.com with SMTP id j4so26332336wrp.10 for ; Thu, 02 Jul 2020 07:44:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id; bh=Wr6vKnMO3nR7PwYZXjk1oOIxaUj/saExnI4Slj9N4J4=; b=l97a+cx0TaM3AUxoe/kVnhNZqVBd29cWNPPbRefY08EOqrMLotzojrk61/VPlOd5bz nYJMvzjtdWwkKQ2diVfN5l9sHtQrIIEJsWxhZRbPY9R+POe2+ylz+xx92yk1aJiibiwa +y/xZPjkFWITRwTAHWpvwm0Q2EJ/gOTOnxZNRbewGYTvPDDinuOq0MDhuLgUtJnnO1B2 XnBJn1gSEtFmzZA7G2OvvN1UbDJLNAgOePcrVrCrDCvDc0IxbVFQDUvCuxVK8IFY95Np xZ4N0cp4d79vhy/4lv9Vmxgy0gjgT3u9n9U9NGsZWf4CVZDJIFcjJekAlSOB77ePAOkc fZPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=Wr6vKnMO3nR7PwYZXjk1oOIxaUj/saExnI4Slj9N4J4=; b=SEATWGi1xxwDs7G87TsqdM25XDs8N9mF+sb945gpwMm9ajI9l1lP9/JmRW5e+QtqYS wGB9ezKqyKJvDJzQNGgve9p5DomzDIkXWycr6oejnql7CISnfdq4QSc+pg9jmzVUHm8v t0MoYToMDRbCOPnLJDxZI2AvAAk7/pO+ta2fyyqPR2Zz0UTLyBrElkDJJVJHjUdbmONu IPr1fohxxF5NZJz1Lhv+VeBMVIA19QZHzvXtbQxUxBGfWUdhtgC23VSFp8Ihi4kAftfV jRHqGy7tLI58oeX/yjPR4zDSxy6OYr76wOhXXZam38fDrZvk5uQAfnsOObqCoJUkQ/rL AjSQ== X-Gm-Message-State: AOAM531WO8q0y5szAway7cmIHve0KIgKNlaXz1T9d7Gy1ZhNOJeVeBJS tUjDUjvXwqgrTPNztUNvFDcmZA== X-Received: by 2002:adf:81c7:: with SMTP id 65mr30410770wra.47.1593701039345; Thu, 02 Jul 2020 07:43:59 -0700 (PDT) Received: from localhost.localdomain ([2a01:e0a:f:6020:20a3:2f17:1ded:77de]) by smtp.gmail.com with ESMTPSA id 104sm11557438wrl.25.2020.07.02.07.43.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Jul 2020 07:43:58 -0700 (PDT) From: Vincent Guittot To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, linux-kernel@vger.kernel.org Cc: valentin.schneider@arm.com, Vincent Guittot Subject: [PATCH] sched/fair: handle case of task_h_load() returning 0 Date: Thu, 2 Jul 2020 16:42:58 +0200 Message-Id: <20200702144258.19326-1-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.17.1 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org task_h_load() can return 0 in some situations like running stress-ng mmapfork, which forks thousands of threads, in a sched group on a 224 cores system. The load balance doesn't handle this correctly because env->imbalance never decreases and it will stop pulling tasks only after reaching loop_max, which can be equal to the number of running tasks of the cfs. Make sure that imbalance will be decreased by at least 1. misfit task is the other feature that doesn't handle correctly such situation although it's probably more difficult to face the problem because of the smaller number of CPUs and running tasks on heterogenous system. We can't simply ensure that task_h_load() returns at least one because it would imply to handle underrun in other places. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6fab1d17c575..62747c24aa9e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4049,7 +4049,13 @@ static inline void update_misfit_status(struct task_struct *p, struct rq *rq) return; } - rq->misfit_task_load = task_h_load(p); + /* + * Make sure that misfit_task_load will not be null even if + * task_h_load() returns 0. misfit_task_load is only used to select + * rq with highest load so adding 1 will not modify the result + * of the comparison. + */ + rq->misfit_task_load = task_h_load(p) + 1; } #else /* CONFIG_SMP */ @@ -7664,6 +7670,16 @@ static int detach_tasks(struct lb_env *env) env->sd->nr_balance_failed <= env->sd->cache_nice_tries) goto next; + /* + * Depending of the number of CPUs and tasks and the + * cgroup hierarchy, task_h_load() can return a null + * value. Make sure that env->imbalance decreases + * otherwise detach_tasks() will stop only after + * detaching up to loop_max tasks. + */ + if (!load) + load = 1; + env->imbalance -= load; break; -- 2.17.1