Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp282876ybx; Wed, 6 Nov 2019 00:05:28 -0800 (PST) X-Google-Smtp-Source: APXvYqz2JUWnySCrG9goCQlfrH2pon+r8E5oSZm+BP2EFlweXGYgXnBKaK03Q/VZBjrQkaBB0kst X-Received: by 2002:a17:906:1812:: with SMTP id v18mr33711130eje.86.1573027528606; Wed, 06 Nov 2019 00:05:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573027528; cv=none; d=google.com; s=arc-20160816; b=xdEZ7/wDoGu+726s0oOB8A1FaoOH1UJE2lFQQrZm97b9UXw218BcCofbFee8IeGhn3 469IFkCc91nWOtDBWgqkFJdPNTnovEPZRXfuylXiTCdZaknoJaqzsSOqs0lWDXngpgmF US5ClPtp3QMwJja3N49GdSSAqOcDB9H5l/56cVEDiOiRwXIjgRWOCjGy7RFkZy/c3Uc/ 8tSWihpNMBh/4DYHC0LIC0WQeZ8/l0qMnbV55MNO1bP0AUu/v6AmO2eUfVNGHLMC4tgc iK82NIRzKgMJrd/5N/O2gXQ+ny2EYTfzpV2S9BTyRZUS1I82Z36gFEzq61iB0JLXQqr/ O8hw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject; bh=04WdCaHn10brp7KvLfzs6Tk9q4/zYx0IfUWU5lklF1c=; b=nN1EeRSWu1wz9AqmnHWkGTYvpJF3m9ful5prYxmLWVb80JFc7UKNL2EnRRLCAWHAOk IUNgyICJRbuvP+UoXNP8MRNHg02p7MXHMCH0kdU3y78LYJWc41M/C4+nNzARabVKUD/J NbfXWsnSINXgIyi6Mtb9sKWq7vIpgBuh02CBGiBHx38QlEL8PpY5vDDB/XnMLPbdC01B KchUMo6o3cxPPk1chpNY4dCVorta+Ya8O8ilKPMW3+D4XVDASh3Dv27ZtwkDIAJNlPW3 5yjSljvB6kpmp8J5BK9q10Q6XB+tdXn1IG5dKowm2Itw71xXJzIVgtlkt2KESHBHFVMJ iFLA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e12si4470492ejx.169.2019.11.06.00.05.04; Wed, 06 Nov 2019 00:05:28 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731401AbfKFICn (ORCPT + 99 others); Wed, 6 Nov 2019 03:02:43 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:6158 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1729878AbfKFICn (ORCPT ); Wed, 6 Nov 2019 03:02:43 -0500 Received: from DGGEMS407-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id A97916C62C43118DD2C0; Wed, 6 Nov 2019 16:02:39 +0800 (CST) Received: from [127.0.0.1] (10.74.221.148) by DGGEMS407-HUB.china.huawei.com (10.3.19.207) with Microsoft SMTP Server id 14.3.439.0; Wed, 6 Nov 2019 16:02:30 +0800 Subject: Re: [PATCH v2] lib: optimize cpumask_local_spread() To: Michal Hocko , Andrew Morton References: <1572863268-28585-1-git-send-email-zhangshaokun@hisilicon.com> <20191105070141.GF22672@dhcp22.suse.cz> <20191105173359.39052327cf221d9c4b26b783@linux-foundation.org> <20191106071742.GB8314@dhcp22.suse.cz> CC: , yuqi jin , "Mike Rapoport" , Paul Burton , "Michael Ellerman" , Anshuman Khandual From: Shaokun Zhang Message-ID: Date: Wed, 6 Nov 2019 16:02:29 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <20191106071742.GB8314@dhcp22.suse.cz> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.74.221.148] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Michal, On 2019/11/6 15:17, Michal Hocko wrote: > On Tue 05-11-19 17:33:59, Andrew Morton wrote: >> On Tue, 5 Nov 2019 08:01:41 +0100 Michal Hocko wrote: >> >>> On Mon 04-11-19 18:27:48, Shaokun Zhang wrote: >>>> From: yuqi jin >>>> >>>> In the multi-processor and NUMA system, I/O device may have many numa >>>> nodes belonging to multiple cpus. When we get a local numa, it is >>>> better to find the node closest to the local numa node, instead >>>> of choosing any online cpu immediately. >>>> >>>> For the current code, it only considers the local NUMA node and it >>>> doesn't compute the distances between different NUMA nodes for the >>>> non-local NUMA nodes. Let's optimize it and find the nearest node >>>> through NUMA distance. The performance will be better if it return >>>> the nearest node than the random node. >>> >>> Numbers please >> >> The changelog had >> >> : When Parameter Server workload is tested using NIC device on Huawei >> : Kunpeng 920 SoC: >> : Without the patch, the performance is 22W QPS; >> : Added this patch, the performance become better and it is 26W QPS. > > Maybe it is just me but this doesn't really tell me a lot. What is > Parameter Server workload? What do I do to replicate those numbers? Is I will give it better description on it in next version. Since it returns the nearest node from the non-local node than the random one, no harmless to others, Right? > this really specific to the Kunpeng 920 server? What is the usual > variance of the performance numbers? > >>> [...] >>>> +/** >>>> + * cpumask_local_spread - select the i'th cpu with local numa cpu's first >>>> + * @i: index number >>>> + * @node: local numa_node >>>> + * >>>> + * This function selects an online CPU according to a numa aware policy; >>>> + * local cpus are returned first, followed by the nearest non-local ones, >>>> + * then it wraps around. >>>> + * >>>> + * It's not very efficient, but useful for setup. >>>> + */ >>>> +unsigned int cpumask_local_spread(unsigned int i, int node) >>>> +{ >>>> + int node_dist[MAX_NUMNODES] = {0}; >>>> + bool used[MAX_NUMNODES] = {0}; >>> >>> Ugh. This might be a lot of stack space. Some distro kernels use large >>> NODE_SHIFT (e.g 10 so this would be 4kB of stack space just for the >>> node_dist). >> >> Yes, that's big. From a quick peek I suspect we could get by using an >> array of unsigned shorts here but that might be fragile over time even >> if it works now? > > Whatever data type we use it will be still quite large to be on the > stack. > >> Perhaps we could make it a statically allocated array and protect the >> entire thing with a spin_lock_irqsave()? It's not a frequently called >> function. > > This is what I was suggesting in previous review feedback. Ok, will do it in next version. Thanks, Shaokun >