Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp3762384ybx; Mon, 4 Nov 2019 02:29:42 -0800 (PST) X-Google-Smtp-Source: APXvYqwqQyRMRT5Uz+nI5C3MdRKoZHM4GWxnEcbWezYcnBSvs5LkGbrWoc4Bqdjqm6zeBN0WYEJb X-Received: by 2002:a17:906:245b:: with SMTP id a27mr23461446ejb.192.1572863382327; Mon, 04 Nov 2019 02:29:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1572863382; cv=none; d=google.com; s=arc-20160816; b=g2I7tSFKZl9DHghUqx9g1frV8BoWOscDFPlrhWNLcjtuJ9Fg2jkDIHXhiMl9XRYGvz e+kzqxn9RE/jTed//B6lD+r/Zx0x5vR9WV75PDCOrHKZs+XLQ7SWib/m7cK4x3lSR2XP sZOKLhwZawW8UTXf+DpYbns7bI8bks0NjgXnBCa38gfTtuzpRfYX9lLh/5v0vP/J5WyI wMhoRGlZBVmWoGsvNbtVWqN5sak8xj3ZcHYsTo22m3momsyaaDwMcbyIqGuxwJj56r/t 0Pii8oU1n9Yf/R893r46GSkRv9wNIGOMXVz4uHBbpwIAczIFDMlmRkOmaQOa+Vh+4KZK 5Kpw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:subject:cc :to:from; bh=gXolTrrZ2ne8ww7J0s7n6/4G+gC7dR1AFjCBAvqRZc8=; b=mgNHHR3N2/BU8d/JyoNPuV09oEAh+D97CQ6dVH/xA+81LzPNCsEN2J+F56CAlC2rrA tcvUIp+ZD8K5KsRqWoDOmzCifmg4lk59t8qx1702wDQLEVxNIP0U5c5atP20bEXpQHZr vgsAgpKmmEGPBr6g2NRhEGcxtY9HRpaDAOTj16v05EWnNB50FacoPlnJmo5WEUwO20mE PtjatkMnyHtFzm+LvgJyUVIU7i+zxFAqGFzyj/pV3jw/lk7ugXq+IZl/ZLAw8Tg+2NJX SXOeGx7w1wuUMLqMSmql+lPcqfWQFJyYZrZrv+vDZ1rvJLN/id3ew5NACXVmVktVOB5F vh2Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t18si945849ejx.354.2019.11.04.02.29.18; Mon, 04 Nov 2019 02:29:42 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728064AbfKDK1Y (ORCPT + 99 others); Mon, 4 Nov 2019 05:27:24 -0500 Received: from szxga06-in.huawei.com ([45.249.212.32]:42120 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727499AbfKDK1X (ORCPT ); Mon, 4 Nov 2019 05:27:23 -0500 Received: from DGGEMS402-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 8E75C6DF7F8F51AADF68; Mon, 4 Nov 2019 18:27:21 +0800 (CST) Received: from localhost.localdomain (10.69.192.56) by DGGEMS402-HUB.china.huawei.com (10.3.19.202) with Microsoft SMTP Server id 14.3.439.0; Mon, 4 Nov 2019 18:27:14 +0800 From: Shaokun Zhang To: CC: yuqi jin , Andrew Morton , Mike Rapoport , Paul Burton , Michal Hocko , Michael Ellerman , Anshuman Khandual , Shaokun Zhang Subject: [PATCH v2] lib: optimize cpumask_local_spread() Date: Mon, 4 Nov 2019 18:27:48 +0800 Message-ID: <1572863268-28585-1-git-send-email-zhangshaokun@hisilicon.com> X-Mailer: git-send-email 2.7.4 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.69.192.56] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: yuqi jin In the multi-processor and NUMA system, I/O device may have many numa nodes belonging to multiple cpus. When we get a local numa, it is better to find the node closest to the local numa node, instead of choosing any online cpu immediately. For the current code, it only considers the local NUMA node and it doesn't compute the distances between different NUMA nodes for the non-local NUMA nodes. Let's optimize it and find the nearest node through NUMA distance. The performance will be better if it return the nearest node than the random node. When Parameter Server workload is tested using NIC device on Huawei Kunpeng 920 SoC: Without the patch, the performance is 22W QPS; Added this patch, the performance become better and it is 26W QPS. Cc: Andrew Morton Cc: Mike Rapoport Cc: Paul Burton Cc: Michal Hocko Cc: Michael Ellerman Cc: Anshuman Khandual Signed-off-by: yuqi jin Signed-off-by: Shaokun Zhang --- lib/cpumask.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 81 insertions(+), 12 deletions(-) diff --git a/lib/cpumask.c b/lib/cpumask.c index 0cb672eb107c..15d8940f32a8 100644 --- a/lib/cpumask.c +++ b/lib/cpumask.c @@ -192,18 +192,39 @@ void __init free_bootmem_cpumask_var(cpumask_var_t mask) } #endif -/** - * cpumask_local_spread - select the i'th cpu with local numa cpu's first - * @i: index number - * @node: local numa_node - * - * This function selects an online CPU according to a numa aware policy; - * local cpus are returned first, followed by non-local ones, then it - * wraps around. - * - * It's not very efficient, but useful for setup. - */ -unsigned int cpumask_local_spread(unsigned int i, int node) +static void calc_node_distance(int *node_dist, int node) +{ + int i; + + for (i = 0; i < nr_node_ids; i++) + node_dist[i] = node_distance(node, i); +} + +static int find_nearest_node(int *node_dist, bool *used) +{ + int i, min_dist = node_dist[0], node_id = -1; + + /* Choose the first unused node to compare */ + for (i = 0; i < nr_node_ids; i++) { + if (used[i] == 0) { + min_dist = node_dist[i]; + node_id = i; + break; + } + } + + /* Compare and return the nearest node */ + for (i = 0; i < nr_node_ids; i++) { + if (node_dist[i] < min_dist && used[i] == 0) { + min_dist = node_dist[i]; + node_id = i; + } + } + + return node_id; +} + +static unsigned int __cpumask_local_spread(unsigned int i, int node) { int cpu; @@ -231,4 +252,52 @@ unsigned int cpumask_local_spread(unsigned int i, int node) } BUG(); } + +/** + * cpumask_local_spread - select the i'th cpu with local numa cpu's first + * @i: index number + * @node: local numa_node + * + * This function selects an online CPU according to a numa aware policy; + * local cpus are returned first, followed by the nearest non-local ones, + * then it wraps around. + * + * It's not very efficient, but useful for setup. + */ +unsigned int cpumask_local_spread(unsigned int i, int node) +{ + int node_dist[MAX_NUMNODES] = {0}; + bool used[MAX_NUMNODES] = {0}; + int cpu, j, id; + + /* Wrap: we always want a cpu. */ + i %= num_online_cpus(); + + if (node == NUMA_NO_NODE) { + for_each_cpu(cpu, cpu_online_mask) + if (i-- == 0) + return cpu; + } else { + if (nr_node_ids > MAX_NUMNODES) + return __cpumask_local_spread(i, node); + + calc_node_distance(node_dist, node); + for (j = 0; j < nr_node_ids; j++) { + id = find_nearest_node(node_dist, used); + if (id < 0) + break; + + for_each_cpu_and(cpu, cpumask_of_node(id), + cpu_online_mask) + if (i-- == 0) + return cpu; + used[id] = 1; + } + + for_each_cpu(cpu, cpu_online_mask) + if (i-- == 0) + return cpu; + } + BUG(); +} EXPORT_SYMBOL(cpumask_local_spread); -- 2.7.4