Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp439832ybc; Tue, 12 Nov 2019 03:58:56 -0800 (PST) X-Google-Smtp-Source: APXvYqy79wwh3uYXApCiUDGVUETmeyiM/Fp8fwSicD2Cu+1J+eE6FYJ27RyALuvFyFmCrpEyK5+i X-Received: by 2002:aa7:c343:: with SMTP id j3mr31943386edr.4.1573559936378; Tue, 12 Nov 2019 03:58:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573559936; cv=none; d=google.com; s=arc-20160816; b=HFgO67DRRgc2Jd724NRMyZV+rAxdYA/n3z+99XtLkOmj2HoIUXbmrgkDGLxiV47I+l Rd4dyzkD+dkcCJsy1g/Yq7wbkko8/zH6t0TtZYJ9QQrlxUKkE5xYYzPbDlFhNMwVl1Ug I3QQgpo2rcj3WfDY2RQx2gXTXm4gkUuFRn8pnLwJ3EnfgQGf8zQkX4Lp9czNPjzWxBBu eYVTCxvwY5IBCd4guVCeyjVX2C4OPksUX24DAtAHuanq1mJmTWd0Co3Vu5yX6/Gif7df Y5wSQzQ249FVWNcHfhuMaTaDEQAuTcR6yi10z5mJXTk8cd3LXe7NOdTqzqaOqgh3leFW o65Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=B82wXCv+n+/80eGeBdWY0BIRT9dtZAE5BMqsRsxoUTA=; b=FNL9eFhE6Pg+m0NVJX139O6uhBVCt49caiIzfUltW4jBB/ruujiXI7RKdu3m2PvU7Y voBaHWKD9aXnybDi3IyeoPY6VfWoktEvZ+MVJKqcgMHIre9MXFS0hDS3R5wyWNN2u9bP OduEPccoUPiv2995ztPg4X89mF3BE6inSyMgp0QkAsnpe2iInnyBdfi4bnqoL8q6WbjZ LGVgRCCxH6HAvNeqi/CXfGIrZmsMcr6cd3sTFQ9eKE+dmrefSoZ/n/ZJSCB/8qSfFZqA PJxEQj1Am8PHDs+7NtP+ofSgyA6PjHp52UPzmLFY6bTB1uolPwAqIr69Qk3pMo0h7NQ1 LtJw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h17si11194329ejy.239.2019.11.12.03.58.31; Tue, 12 Nov 2019 03:58:56 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727109AbfKLL4e (ORCPT + 99 others); Tue, 12 Nov 2019 06:56:34 -0500 Received: from mx2.suse.de ([195.135.220.15]:39848 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725775AbfKLL4d (ORCPT ); Tue, 12 Nov 2019 06:56:33 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 1A569AC4A; Tue, 12 Nov 2019 11:56:31 +0000 (UTC) Date: Tue, 12 Nov 2019 12:56:30 +0100 From: Michal Hocko To: Shaokun Zhang Cc: linux-kernel@vger.kernel.org, yuqi jin , Andrew Morton , Mike Rapoport , Paul Burton , Michael Ellerman , Anshuman Khandual , netdev@vger.kernel.org Subject: Re: [PATCH v3] lib: optimize cpumask_local_spread() Message-ID: <20191112115630.GD2763@dhcp22.suse.cz> References: <1573091048-10595-1-git-send-email-zhangshaokun@hisilicon.com> <20191108103102.GF15658@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 11-11-19 10:02:37, Shaokun Zhang wrote: > Hi Michal, > > On 2019/11/8 18:31, Michal Hocko wrote: > > This changelog looks better, thanks! I still have some questions though. > > Btw. cpumask_local_spread is used by the networking code but I do not > > see net guys involved (Cc netdev) > > Oh, I forgot to involve the net guys, sorry. > > > > > On Thu 07-11-19 09:44:08, Shaokun Zhang wrote: > >> From: yuqi jin > >> > >> In the multi-processors and NUMA system, I/O driver will find cpu cores > >> that which shall be bound IRQ. When cpu cores in the local numa have > >> been used, it is better to find the node closest to the local numa node, > >> instead of choosing any online cpu immediately. > >> > >> On Huawei Kunpeng 920 server, there are 4 NUMA node(0 -3) in the 2-cpu > >> system(0 - 1). > > > > Please send a topology of this server (numactl -H). > > > > available: 4 nodes (0-3) > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 > node 0 size: 63379 MB > node 0 free: 61899 MB > node 1 cpus: 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 > node 1 size: 64509 MB > node 1 free: 63942 MB > node 2 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 > node 2 size: 64509 MB > node 2 free: 63056 MB > node 3 cpus: 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 > node 3 size: 63997 MB > node 3 free: 63420 MB > node distances: > node 0 1 2 3 > 0: 10 16 32 33 > 1: 16 10 25 32 > 2: 32 25 10 16 > 3: 33 32 16 10 > > >> We perform PS (parameter server) business test, the > >> behavior of the service is that the client initiates a request through > >> the network card, the server responds to the request after calculation. > > > > Is the benchmark any ublicly available? > > > > Sorry, the PS which we test is not open, but I think redis is the same as PS > on the macro level. When there are both 24 redis servers on node2 and node3. > if the 24-47 irqs and xps of NIC are not bound to node3, the redis servers > on node3 will not performance good. Are there any other benchmarks showing improvements? > >> When two PS processes run on node2 and node3 separately and the > >> network card is located on 'node2' which is in cpu1, the performance > >> of node2 (26W QPS) and node3 (22W QPS) was different. > >> It is better that the NIC queues are bound to the cpu1 cores in turn, > >> then XPS will also be properly initialized, while cpumask_local_spread > >> only considers the local node. When the number of NIC queues exceeds > >> the number of cores in the local node, it returns to the online core > >> directly. So when PS runs on node3 sending a calculated request, > >> the performance is not as good as the node2. It is considered that > >> the NIC and other I/O devices shall initialize the interrupt binding, > >> if the cores of the local node are used up, it is reasonable to return > >> the node closest to it. > > > > Can you post cpu affinities before and after this patch? > > > > Before this patch > Euler:/sys/bus/pci/devices/0000:7d:00.2 # cat numa_node > 2 > Euler:~ # cat /proc/irq/345/smp_affinity #IRQ0 > 00000000,00010000,00000000 This representation is awkward to parse. Could you add smp_affinity_list please? It would save quite some head scratching. -- Michal Hocko SUSE Labs