Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp2021880ybx; Thu, 7 Nov 2019 21:51:42 -0800 (PST) X-Google-Smtp-Source: APXvYqyJawuFlYzrxmMyXh2K+P2yqDB7AALSXxTA1GGJEVLQw6y4BK59HObQxeHensTSVs5gQAG2 X-Received: by 2002:aa7:d496:: with SMTP id b22mr8225070edr.122.1573192302391; Thu, 07 Nov 2019 21:51:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573192302; cv=none; d=google.com; s=arc-20160816; b=j4YGJ7P6xpT/GCuKafy0WI1qhoQHTdSrs7n1bm22f29N8MPA/enZ58CI3DQ//cgVNN Af78CAncq1u5lJWoVg7MtgHjNpPzYg6MxnHMF7VYHoaxHSnhWHCXv/CKkSDHVQoeKh6P y1dPGY/Du4/uyrAoCjK5PZx3x1k3O7T9w1uSdFp/6x1GwgYVJxacPc8r72ykf7fDGPbZ 9WDJUkDqPCZpw/qG09ssi4VKfNTf6NIj2gJnLtKCVeHdOh9gXWdiQ5JLcFZXTv/fVeF+ 2lALPmq0q4thLRD4zje7KGhuo1beigHkH7Ueh9ip0a+lnes+Jfhnwb57p0MojwMY3Tt/ /zqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject; bh=4ePfLhWkZX3MntLNp1dXStZWg50RYUkr0pyC/xze50A=; b=APU9xi7dPntr8BD5DIv4jah8KdG+yZkZoxOZyvlRrPRUdj0BxPHRhNnlNsRVBUO0/i Ri5j1k01Is65f1psLu2UgiEMfn5FHsP1tugTMx3o+POqfsFNKINOPC0dlLoggrJapBkK P5QCfhNcAfmCjA73zcybrRf87DEV+VgNLKb6vpBXyQ3U6hymWJl76WfPnvENzlpXih9F N2HkR88s7ZLfpDJhxyA9lt71WK6DEzxmaUEeCx5H0DPsyC/DaFPbyphw8BSwFHhpKqFi b3axI9uWGMGaYvI4BpNnS9/+PeCrB96N6hLwsqNYq7FV3Fq3UGVzIfAlwKUIPvAtYbkb 8BOg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w27si3390234eda.296.2019.11.07.21.51.19; Thu, 07 Nov 2019 21:51:42 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727017AbfKHFuP (ORCPT + 99 others); Fri, 8 Nov 2019 00:50:15 -0500 Received: from szxga07-in.huawei.com ([45.249.212.35]:46454 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725877AbfKHFuP (ORCPT ); Fri, 8 Nov 2019 00:50:15 -0500 Received: from DGGEMS414-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 4D05130B7E651556B2F6; Fri, 8 Nov 2019 13:50:13 +0800 (CST) Received: from [127.0.0.1] (10.74.221.148) by DGGEMS414-HUB.china.huawei.com (10.3.19.214) with Microsoft SMTP Server id 14.3.439.0; Fri, 8 Nov 2019 13:50:06 +0800 Subject: Re: [PATCH v3] lib: optimize cpumask_local_spread() To: Andrew Morton References: <1573091048-10595-1-git-send-email-zhangshaokun@hisilicon.com> <20191107194942.734bc867e1c9578d07cf1712@linux-foundation.org> CC: , yuqi jin , "Mike Rapoport" , Paul Burton , "Michal Hocko" , Michael Ellerman , "Anshuman Khandual" From: Shaokun Zhang Message-ID: Date: Fri, 8 Nov 2019 13:50:05 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <20191107194942.734bc867e1c9578d07cf1712@linux-foundation.org> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.74.221.148] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Andrew, On 2019/11/8 11:49, Andrew Morton wrote: > On Thu, 7 Nov 2019 09:44:08 +0800 Shaokun Zhang wrote: > >> In the multi-processors and NUMA system, I/O driver will find cpu cores >> that which shall be bound IRQ. When cpu cores in the local numa have >> been used, it is better to find the node closest to the local numa node, >> instead of choosing any online cpu immediately. >> >> On Huawei Kunpeng 920 server, there are 4 NUMA node(0 -3) in the 2-cpu >> system(0 - 1). We perform PS (parameter server) business test, the >> behavior of the service is that the client initiates a request through >> the network card, the server responds to the request after calculation. >> When two PS processes run on node2 and node3 separately and the >> network card is located on 'node2' which is in cpu1, the performance >> of node2 (26W QPS) and node3 (22W QPS) was different. >> It is better that the NIC queues are bound to the cpu1 cores in turn, >> then XPS will also be properly initialized, while cpumask_local_spread >> only considers the local node. When the number of NIC queues exceeds >> the number of cores in the local node, it returns to the online core >> directly. So when PS runs on node3 sending a calculated request, >> the performance is not as good as the node2. It is considered that >> the NIC and other I/O devices shall initialize the interrupt binding, >> if the cores of the local node are used up, it is reasonable to return >> the node closest to it. >> >> Let's optimize it and find the nearest node through NUMA distance for the >> non-local NUMA nodes. The performance will be better if it return the >> nearest node than the random node. >> >> After this patch, the performance of the node3 is the same as node2 >> that is 26W QPS when the network card is still in 'node2'. Since it will >> return the closest non-local NUMA code rather than random node, it is no >> harm to others at least. > > This is a little nicer: > > --- a/lib/cpumask.c~lib-optimize-cpumask_local_spread-v3-fix > +++ a/lib/cpumask.c > @@ -254,7 +254,6 @@ static unsigned int __cpumask_local_spre > BUG(); > } > > -static DEFINE_SPINLOCK(spread_lock); > /** > * cpumask_local_spread - select the i'th cpu with local numa cpu's first > * @i: index number > @@ -270,6 +269,7 @@ unsigned int cpumask_local_spread(unsign > { > static int node_dist[MAX_NUMNODES]; > static bool used[MAX_NUMNODES]; > + static DEFINE_SPINLOCK(spread_lock); Good catch, thanks for fixing it. Shaokun. > unsigned long flags; > int cpu, j, id; > > _ > > > . >