Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp2556231ybv; Mon, 24 Feb 2020 07:21:27 -0800 (PST) X-Google-Smtp-Source: APXvYqzSwDgzOi8hKP760JJjo0t2FhTm5+R2WvyTHk9QWEAzsNVK5iyjc2Xgdhi4w4aEX9g311ey X-Received: by 2002:aca:815:: with SMTP id 21mr13222839oii.52.1582557687456; Mon, 24 Feb 2020 07:21:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582557687; cv=none; d=google.com; s=arc-20160816; b=NhbJMcEtbAZUZH/hwkaKCkONW8ZzMo6Ag5MsFkN2uv07iEKbNnuRdbW2XjbDDfThY3 GzMkaizAedrYL6GUxyRlHZ0thBZx8GKXpoFWzUs1F8fnG9IUcGNl3ps9BU6SGMCHEWAX wCy1C840+fRWiENvFUp5k1zYhcjqAhu0AblWGyaVXvdBATDPOAqg8rg7dRghMdYvBm8Z 3qemFlX0Y8SSBFeNzV7rKVmsNw/RkE+NniGyM5vspnx6CP1Izb9i52UzZ7GV5Om2TDpQ xuJvUJuthMQA34TKVHcYvGyDeWREZTsosOWkEslcmqLdLSr1TynPW3WJa2CzIVNNVo6y B0jg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :robot-unsubscribe:robot-id:message-id:mime-version:references :in-reply-to:cc:subject:to:reply-to:from:date; bh=6Pxix3wdO0DtmJjDHUCYocOyOYctyDUr6PqzXcLnuzE=; b=Sn0HNpLYCOn9LGsDnQyKVqWsf9+zApRCbenapzkmFc1N+/hSkbghgtXH+ABnrtoYKI 3qvrdujESmrScBO96NaNh8lAAsirDOsxF69exQgWPO6hr7rasRMu47YYWfEOSvYRcb/f TEyqteo8hixs59xaqY57vMb+zxQoJl1AITmGnkQGzxOpHS48t8lMh1M2cAL7Tq29KuyI vZtECx/H7hNwFwpfqw90e/AVOMiz1S1WujjtuKYp/hEj8cDmeJS2TBi+fwh9LfeqO79p vDHApZws2TSYPoOsa97NdIMxwpj0XUkCdH6/guwliLixVHjEobaJwd+LZUiodYeAUbpi WNBQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d2si6264863oth.267.2020.02.24.07.21.15; Mon, 24 Feb 2020 07:21:27 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728010AbgBXPUr (ORCPT + 99 others); Mon, 24 Feb 2020 10:20:47 -0500 Received: from Galois.linutronix.de ([193.142.43.55]:50321 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727749AbgBXPUq (ORCPT ); Mon, 24 Feb 2020 10:20:46 -0500 Received: from [5.158.153.53] (helo=tip-bot2.lab.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1j6FX6-0005qJ-Vm; Mon, 24 Feb 2020 16:20:33 +0100 Received: from [127.0.1.1] (localhost [IPv6:::1]) by tip-bot2.lab.linutronix.de (Postfix) with ESMTP id A27661C213A; Mon, 24 Feb 2020 16:20:32 +0100 (CET) Date: Mon, 24 Feb 2020 15:20:32 -0000 From: "tip-bot2 for Mel Gorman" Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: sched/core] sched/numa: Use similar logic to the load balancer for moving between domains with spare capacity Cc: Mel Gorman , Ingo Molnar , Peter Zijlstra , Vincent Guittot , Juri Lelli , Dietmar Eggemann , Valentin Schneider , Phil Auld , Hillf Danton , x86 , LKML In-Reply-To: <20200224095223.13361-7-mgorman@techsingularity.net> References: <20200224095223.13361-7-mgorman@techsingularity.net> MIME-Version: 1.0 Message-ID: <158255763235.28353.6329377716415522983.tip-bot2@tip-bot2> X-Mailer: tip-git-log-daemon Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The following commit has been merged into the sched/core branch of tip: Commit-ID: fb86f5b2119245afd339280099b4e9417cc0b03a Gitweb: https://git.kernel.org/tip/fb86f5b2119245afd339280099b4e9417cc0b03a Author: Mel Gorman AuthorDate: Mon, 24 Feb 2020 09:52:16 Committer: Ingo Molnar CommitterDate: Mon, 24 Feb 2020 11:36:35 +01:00 sched/numa: Use similar logic to the load balancer for moving between domains with spare capacity The standard load balancer generally tries to keep the number of running tasks or idle CPUs balanced between NUMA domains. The NUMA balancer allows tasks to move if there is spare capacity but this causes a conflict and utilisation between NUMA nodes gets badly skewed. This patch uses similar logic between the NUMA balancer and load balancer when deciding if a task migrating to its preferred node can use an idle CPU. Signed-off-by: Mel Gorman Signed-off-by: Ingo Molnar Acked-by: Peter Zijlstra Cc: Vincent Guittot Cc: Juri Lelli Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Phil Auld Cc: Hillf Danton Link: https://lore.kernel.org/r/20200224095223.13361-7-mgorman@techsingularity.net --- kernel/sched/fair.c | 81 +++++++++++++++++++++++++++----------------- 1 file changed, 50 insertions(+), 31 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index bc3d651..7a3c66f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1520,6 +1520,7 @@ struct task_numa_env { static unsigned long cpu_load(struct rq *rq); static unsigned long cpu_util(int cpu); +static inline long adjust_numa_imbalance(int imbalance, int src_nr_running); static inline enum numa_type numa_classify(unsigned int imbalance_pct, @@ -1594,11 +1595,6 @@ static bool load_too_imbalanced(long src_load, long dst_load, long orig_src_load, orig_dst_load; long src_capacity, dst_capacity; - - /* If dst node has spare capacity, there is no real load imbalance */ - if (env->dst_stats.node_type == node_has_spare) - return false; - /* * The load is corrected for the CPU capacity available on each node. * @@ -1757,19 +1753,42 @@ unlock: static void task_numa_find_cpu(struct task_numa_env *env, long taskimp, long groupimp) { - long src_load, dst_load, load; bool maymove = false; int cpu; - load = task_h_load(env->p); - dst_load = env->dst_stats.load + load; - src_load = env->src_stats.load - load; - /* - * If the improvement from just moving env->p direction is better - * than swapping tasks around, check if a move is possible. + * If dst node has spare capacity, then check if there is an + * imbalance that would be overruled by the load balancer. */ - maymove = !load_too_imbalanced(src_load, dst_load, env); + if (env->dst_stats.node_type == node_has_spare) { + unsigned int imbalance; + int src_running, dst_running; + + /* + * Would movement cause an imbalance? Note that if src has + * more running tasks that the imbalance is ignored as the + * move improves the imbalance from the perspective of the + * CPU load balancer. + * */ + src_running = env->src_stats.nr_running - 1; + dst_running = env->dst_stats.nr_running + 1; + imbalance = max(0, dst_running - src_running); + imbalance = adjust_numa_imbalance(imbalance, src_running); + + /* Use idle CPU if there is no imbalance */ + if (!imbalance) + maymove = true; + } else { + long src_load, dst_load, load; + /* + * If the improvement from just moving env->p direction is better + * than swapping tasks around, check if a move is possible. + */ + load = task_h_load(env->p); + dst_load = env->dst_stats.load + load; + src_load = env->src_stats.load - load; + maymove = !load_too_imbalanced(src_load, dst_load, env); + } for_each_cpu(cpu, cpumask_of_node(env->dst_nid)) { /* Skip this CPU if the source task cannot migrate */ @@ -8694,6 +8713,21 @@ next_group: } } +static inline long adjust_numa_imbalance(int imbalance, int src_nr_running) +{ + unsigned int imbalance_min; + + /* + * Allow a small imbalance based on a simple pair of communicating + * tasks that remain local when the source domain is almost idle. + */ + imbalance_min = 2; + if (src_nr_running <= imbalance_min) + return 0; + + return imbalance; +} + /** * calculate_imbalance - Calculate the amount of imbalance present within the * groups of a given sched_domain during load balance. @@ -8790,24 +8824,9 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s } /* Consider allowing a small imbalance between NUMA groups */ - if (env->sd->flags & SD_NUMA) { - unsigned int imbalance_min; - - /* - * Compute an allowed imbalance based on a simple - * pair of communicating tasks that should remain - * local and ignore them. - * - * NOTE: Generally this would have been based on - * the domain size and this was evaluated. However, - * the benefit is similar across a range of workloads - * and machines but scaling by the domain size adds - * the risk that lower domains have to be rebalanced. - */ - imbalance_min = 2; - if (busiest->sum_nr_running <= imbalance_min) - env->imbalance = 0; - } + if (env->sd->flags & SD_NUMA) + env->imbalance = adjust_numa_imbalance(env->imbalance, + busiest->sum_nr_running); return; }