Received: by 2002:a89:d88:0:b0:1fa:5c73:8e2d with SMTP id eb8csp306355lqb; Thu, 23 May 2024 20:56:22 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUUlElokqBNcsdvbBpmA25Bw8Nd4F1YpwkjpsXfTipsfR4QUo7fS3jl9cxrJXqYETaIyScluM7/Ny31fcbiLFu5O/I5TEf3kNF9qimFcg== X-Google-Smtp-Source: AGHT+IFDqMFh1J4WXsO4vVjewK1a1HNvO5Nb1LhRxpwrEIN3hspiEhb8nSWxXYGGkgbodj6KN2Ay X-Received: by 2002:a7b:c019:0:b0:417:e60b:91f6 with SMTP id 5b1f17b1804b1-42108a3a4e1mr9348935e9.4.1716522981804; Thu, 23 May 2024 20:56:21 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1716522981; cv=pass; d=google.com; s=arc-20160816; b=ssTVCSnjNA3qfBENAi2Bi751jcU7Jaj9dNM4Ip8xCD0EMMrtxskQVilD2rkWFbKv4h y3G+ChWR3SL6rB/0Niqg2idJy5W/wvLACyszlWTO/8MTjWRtayfUSDQhUxaujg+/AIKX I40GmPkaBY4Wh/S3+MLdzb7oN9eMNFGy5uTTsrsAApY7bGLh4xl3XGYWa6NownisfxKQ i/8t1uCHz6yoCluOTUI/SC7A73ns48QRduMmiZR+73Xzm9crWDzVlvyedWX2Hj/30eDm pf46sKL2boDRRKVmUJyMQo74ht2L4wNGwt79Dg1wPtrk7xRsa1iEQOsEr2V0hjigYfPp bwSg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :message-id:date:subject:cc:to:from; bh=CuoioIlgcjGf/xXSDvWXd2riqMTVW7TAtq1Peqme6VU=; fh=7Yf2KIWmh7fEjVP0/5SFrcyQ03VLKHY6zDKnFyfF694=; b=Hy6wi5qfojDPt5iK1QtSZrABW+wORXak82oXEAo12utkLFN18jO2Qngvyo9Q3Hg6fx fxNsYsH7agxrVgqQaGdch+mGgNYJF+TRf1o3dHzVEyo4+GFNtz5NhE3zsip8RyjFhrXm 9EFssSed0fixgRN9Xa/zaABe8SRPzxwCzrjCAjtPm33tDEsl2Txu3uNzI/tAaWQXyQ02 +hGZNptdwkN5fcT9Ri0HlESvchJpUqlAVpjKK9wT0eOxbBFGZ+adrcBrENwXNHAVxn1r 5LhKrG4Z6DpbkLV0/chtpr2Q1t/DDh+LosYn+H16ZyiL6bpBZRyftOa8z3ApcEpYSXYc C3Yw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-188281-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-188281-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id a640c23a62f3a-a626cd94f6csi36406766b.854.2024.05.23.20.56.21 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 20:56:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-188281-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-188281-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-188281-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 8582F1F225AE for ; Fri, 24 May 2024 03:56:21 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A214537707; Fri, 24 May 2024 03:56:15 +0000 (UTC) Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1EEF439850 for ; Fri, 24 May 2024 03:56:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.190 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716522975; cv=none; b=A6i96EAR/UXwpQSl2E3/80LZ+andfMN+6Rjm4g2EludU97ZuVdi+9ZCsK12bnOmK84s/38FMHUxz+lM1DMykFERE0wTN6yi5xuxRrdyPjAqFAWg5UJdlW+cXAcY8PQv+qvrcjwRN04KgZ4W0jJUtSEsAdo0Ra3DuXQT1+k651TQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716522975; c=relaxed/simple; bh=b0OXx68f3pRMuKufG4kLSg9yI3HiTxdnA3tB5xoFOEo=; h=From:To:CC:Subject:Date:Message-ID:MIME-Version:Content-Type; b=dD2VhFZNyJ45D+MWJ+PtbSwSKKoCiq0XWTVYchOigoLTeh7tloyInKyaaqVqh0jqjb3X/HURa5amzn7lYI5Q2qSZwHzfK/5fN/5qecep7gvc0oslH7aA8cdxht1YBzTkkCdZUYdsYSsMZZp+jT3FgExfr/t4VFcgWKgdHTtdzRs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.190 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.163]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4VlrjY4PQ7z1ypJ4; Fri, 24 May 2024 11:53:05 +0800 (CST) Received: from canpemm500001.china.huawei.com (unknown [7.192.104.163]) by mail.maildlp.com (Postfix) with ESMTPS id 22211180067; Fri, 24 May 2024 11:56:09 +0800 (CST) Received: from octopus.huawei.com (10.67.174.191) by canpemm500001.china.huawei.com (7.192.104.163) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Fri, 24 May 2024 11:56:08 +0800 From: Zhang Qiao To: , , , , , , , , , CC: , Subject: [PATCH] sched/numa: Correct NUMA imbalance calculation Date: Fri, 24 May 2024 11:54:38 +0800 Message-ID: <20240524035438.2701479-1-zhangqiao22@huawei.com> X-Mailer: git-send-email 2.18.0.huawei.25 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To canpemm500001.china.huawei.com (7.192.104.163) When perform load balance, a NUMA imbalance is allowed if busy CPUs is less than the maximum threshold, it remains a pair of communication tasks on the current node when the source doamin is lightly loaded. In many cases, this prevents communicating tasks being pulled apart. But when I ran the lmbench bw_pipe testcase, I found that it was a little inconsistent with the above expectations, the communicating tasks were migrated to two different NUMA nodes. There may be two reasons for this issue: 1. calculate_imbalance() use local->sum_nr_running, it may not be accurate, because the communication tasks run on busiest group, it should be busiest->sum_nr_running. 2. In calculate_imbalance(), idles cpus are used to calculat imbalance, but the group_weight may not be equal between local and busiest group(My server has 4 NUMA nodes and kernel builds 3 level NUMA sched_domain, some sched_group's weight is different). In this case, even if both groups are very idle, imbalance will be calculated very large, the difference of busy cpus between groups might be more appropriate as imbalance value. For lmbench bw_pipe(bw_pipe -P 1): v6.6: 1776.7533 MB/sec v6.6 + this patch: 4323 MB/sec Signed-off-by: Zhang Qiao --- kernel/sched/fair.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 03be0d1330a6..c6170cde9c14 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1323,7 +1323,6 @@ static inline bool is_core_idle(int cpu) } #ifdef CONFIG_NUMA -#define NUMA_IMBALANCE_MIN 2 static inline long adjust_numa_imbalance(int imbalance, int dst_running, int imb_numa_nr) @@ -1342,7 +1341,7 @@ adjust_numa_imbalance(int imbalance, int dst_running, int imb_numa_nr) * Allow a small imbalance based on a simple pair of communicating * tasks that remain local when the destination is lightly loaded. */ - if (imbalance <= NUMA_IMBALANCE_MIN) + if (imbalance <= imb_numa_nr) return 0; return imbalance; @@ -10727,14 +10726,15 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s */ env->migration_type = migrate_task; env->imbalance = max_t(long, 0, - (local->idle_cpus - busiest->idle_cpus)); + (busiest->group_weight - busiest->idle_cpus) - + (local->group_weight - local->idle_cpus)); } #ifdef CONFIG_NUMA /* Consider allowing a small imbalance between NUMA groups */ if (env->sd->flags & SD_NUMA) { env->imbalance = adjust_numa_imbalance(env->imbalance, - local->sum_nr_running + 1, + busiest->sum_nr_running, env->sd->imb_numa_nr); } #endif -- 2.18.0.huawei.25