Received: by 2002:a05:7412:8d09:b0:fa:4c10:6cad with SMTP id bj9csp266573rdb; Mon, 15 Jan 2024 22:14:04 -0800 (PST) X-Google-Smtp-Source: AGHT+IFaPxRMT7tjYcel7wRLOmJGwWlTsEo9UwSsiyrNxuqs8iY7b8Frp6JQZTwmeEYhJzUA70fb X-Received: by 2002:a9d:4804:0:b0:6de:6ad0:d34c with SMTP id c4-20020a9d4804000000b006de6ad0d34cmr6895711otf.9.1705385644684; Mon, 15 Jan 2024 22:14:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1705385644; cv=none; d=google.com; s=arc-20160816; b=EdLZopTht2/Erb+2ix1YZzVf3wdNaugzmgUc9p+YKze7BqhF6bjApI6960EzLVVxts QhnI7vYGdvxBEjOv9pgt8HFiuLfK23lfGk43pE/PVIY7Q/G2g6Yv1td48wpxaduiq4GL h92yJI/2avH6nMnsHI/1iezUajNAvV+JdSrX775jFVE7NqXvfTD17Df9X/E3A3JWKIRj 3hAdgD/YhPsY8t8ivcCyveuIJyk9Smz/ZefCLU9tqliMG+XlvUQ7LPBLcbuOdRJvEyFI armApVdyTquH7Ldb1HzTY5SytVUeJKgBEfKYQ8t0+1IGEtEqZMww8fB+RiNAMDztdW5M J/JA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:references :message-id:subject:cc:to:from:date:dkim-signature:dkim-filter; bh=5ClNWrdqiymwS8WaekmLGaesEdlzKLamfgXZ/1HrL9s=; fh=CWXJrv7bCiooIyrlXj6nr/oTCghp6f+cKgvWURQ6hz0=; b=nkoBL+JIe+r98BfT5Gk1UEKpg1LiqbJx561YYFYvmK6MFAKIMQ3Us9wCzLFOAM2QIV eGAY9sJak0ecygnzE/RzS46F7l4ElpsImFjGNVPX7fGEzdMminQTnFvHOg287D3DgYt/ XB2BYWUeE8ck4c+aIo/v5Nj5AdRMB7KPNhQP6p0rqKZ9zWsVFLJflmkTLwJs+M0m7AZQ OKXpKz+ukFcYnOO5EA4ukoYHulhoSwDmg81jWdBJmwQsJQBZzf1ynk3TyuFKSNshq5yF iYDIBsr+Jq4nlWWDKRpeuXYjrhKoq5tJMTquwmbZdkiQO6JyY3WhxhxrlOubb0BWNMLY QP0A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=EyUib51X; spf=pass (google.com: domain of linux-kernel+bounces-27033-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-27033-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id by41-20020a056a0205a900b005ce034f8414si11686381pgb.483.2024.01.15.22.14.04 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Jan 2024 22:14:04 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-27033-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=EyUib51X; spf=pass (google.com: domain of linux-kernel+bounces-27033-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-27033-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 55993B22C04 for ; Tue, 16 Jan 2024 06:14:02 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id BB0F010A09; Tue, 16 Jan 2024 06:13:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="EyUib51X" Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 698DBD524; Tue, 16 Jan 2024 06:13:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Received: by linux.microsoft.com (Postfix, from userid 1099) id D700B20DEEA1; Mon, 15 Jan 2024 22:13:43 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com D700B20DEEA1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1705385623; bh=5ClNWrdqiymwS8WaekmLGaesEdlzKLamfgXZ/1HrL9s=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=EyUib51Xay3y7x0HFYZ+7k3pLh3rxBVlSFUnvPx+rIwFiR5QRjJ3XXbmPKXd1OotS KaFdK/p+mNtsFTOpBK6ERkkjBz8VSuAugAoBoUCu6y5kszyHcF9H2FimjDsfqMqAjU FlCgJrLHIjDz/55qwrimfy2a8HnmZ3KWka+N4Igg= Date: Mon, 15 Jan 2024 22:13:43 -0800 From: Souradeep Chakrabarti To: Yury Norov Cc: Michael Kelley , Haiyang Zhang , KY Srinivasan , "wei.liu@kernel.org" , Dexuan Cui , "davem@davemloft.net" , "edumazet@google.com" , "kuba@kernel.org" , "pabeni@redhat.com" , Long Li , "leon@kernel.org" , "cai.huoqing@linux.dev" , "ssengar@linux.microsoft.com" , "vkuznets@redhat.com" , "tglx@linutronix.de" , "linux-hyperv@vger.kernel.org" , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-rdma@vger.kernel.org" , Souradeep Chakrabarti , Paul Rosswurm Subject: Re: [PATCH 3/4 net-next] net: mana: add a function to spread IRQs per CPUs Message-ID: <20240116061343.GA24925@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> References: <1704797478-32377-4-git-send-email-schakrabarti@linux.microsoft.com> <20240111061319.GC5436@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> <20240113063038.GD5436@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) On Sat, Jan 13, 2024 at 11:11:50AM -0800, Yury Norov wrote: > On Sat, Jan 13, 2024 at 04:20:31PM +0000, Michael Kelley wrote: > > From: Souradeep Chakrabarti Sent: Friday, January 12, 2024 10:31 PM > > > > > On Fri, Jan 12, 2024 at 06:30:44PM +0000, Haiyang Zhang wrote: > > > > > > > > > -----Original Message----- > > > > From: Michael Kelley Sent: Friday, January 12, 2024 11:37 AM > > > > > > > > > > From: Souradeep Chakrabarti Sent: > > > > > Wednesday, January 10, 2024 10:13 PM > > > > > > > > > > > > The test topology was used to check the performance between > > > > > > cpu_local_spread() and the new approach is : > > > > > > Case 1 > > > > > > IRQ Nodes Cores CPUs > > > > > > 0 1 0 0-1 > > > > > > 1 1 1 2-3 > > > > > > 2 1 2 4-5 > > > > > > 3 1 3 6-7 > > > > > > > > > > > > and with existing cpu_local_spread() > > > > > > Case 2 > > > > > > IRQ Nodes Cores CPUs > > > > > > 0 1 0 0 > > > > > > 1 1 0 1 > > > > > > 2 1 1 2 > > > > > > 3 1 1 3 > > > > > > > > > > > > Total 4 channels were used, which was set up by ethtool. > > > > > > case 1 with ntttcp has given 15 percent better performance, than > > > > > > case 2. During the test irqbalance was disabled as well. > > > > > > > > > > > > Also you are right, with 64CPU system this approach will spread > > > > > > the irqs like the cpu_local_spread() but in the future we will offer > > > > > > MANA nodes, with more than 64 CPUs. There it this new design will > > > > > > give better performance. > > > > > > > > > > > > I will add this performance benefit details in commit message of > > > > > > next version. > > > > > > > > > > Here are my concerns: > > > > > > > > > > 1. The most commonly used VMs these days have 64 or fewer > > > > > vCPUs and won't see any performance benefit. > > > > > > > > > > 2. Larger VMs probably won't see the full 15% benefit because > > > > > all vCPUs in the local NUMA node will be assigned IRQs. For > > > > > example, in a VM with 96 vCPUs and 2 NUMA nodes, all 48 > > > > > vCPUs in NUMA node 0 will all be assigned IRQs. The remaining > > > > > 16 IRQs will be spread out on the 48 CPUs in NUMA node 1 > > > > > in a way that avoids sharing a core. But overall the means > > > > > that 75% of the IRQs will still be sharing a core and > > > > > presumably not see any perf benefit. > > > > > > > > > > 3. Your experiment was on a relatively small scale: 4 IRQs > > > > > spread across 2 cores vs. across 4 cores. Have you run any > > > > > experiments on VMs with 128 vCPUs (for example) where > > > > > most of the IRQs are not sharing a core? I'm wondering if > > > > > the results with 4 IRQs really scale up to 64 IRQs. A lot can > > > > > be different in a VM with 64 cores and 2 NUMA nodes vs. > > > > > 4 cores in a single node. > > > > > > > > > > 4. The new algorithm prefers assigning to all vCPUs in > > > > > each NUMA hop over assigning to separate cores. Are there > > > > > experiments showing that is the right tradeoff? What > > > > > are the results if assigning to separate cores is preferred? > > > > > > > > I remember in a customer case, putting the IRQs on the same > > > > NUMA node has better perf. But I agree, this should be re-tested > > > > on MANA nic. > > > > > > 1) and 2) The change will not decrease the existing performance, but for > > > system with high number of CPU, will be benefited after this. > > > > > > 3) The result has shown around 6 percent improvement. > > > > > > 4)The test result has shown around 10 percent difference when IRQs are > > > spread on multiple numa nodes. > > > > OK, this looks pretty good. Make clear in the commit messages what > > the tradeoffs are, and what the real-world benefits are expected to be. > > Some future developer who wants to understand why IRQs are assigned > > this way will thank you. :-) > > I agree with Michael, this needs to be spoken aloud. > > >From the above, is that correct that the best performance is achieved > when the # of IRQs is half the nubmer of CPUs in the 1st node, because > this configuration allows to spread IRQs across cores the most optimal > way? And if we have more or less than that, it hurts performance, at > least for MANA networking? It does not decrease the performance from current cpu_local_spread(), but optimum performance comes when node has CPUs double that of number of IRQs (considering SMT==2). Now only if the number of CPUs are same that of number of IRQs, (that is num of CPUs <= 64) then, we see same performance like existing design with cpu_local_spread(). If node has more CPUs than 64, then we get better performance than cpu_local_spread(). > > So, the B|A performance chart may look like this, right? > > irq nodes cores cpus perf > 0 1 | 1 0 | 0 0 | 0-1 0% > 1 1 | 1 0 | 1 1 | 2-3 +5% > 2 1 | 1 1 | 2 2 | 4-5 +10% > 3 1 | 1 1 | 3 3 | 6-7 +15% > 4 1 | 1 0 | 4 3 | 0-1 +12% > ... | | | > 7 1 | 1 1 | 7 3 | 6-7 0% > ... > 15 2 | 2 3 | 3 15 | 14-15 0% > > Souradeep, can you please confirm that my understanding is correct? > > In v5, can you add a table like the above with real performance > numbers for your driver? I think that it would help people to > configure their VMs better when networking is a bottleneck. > I will share a chart on next version of patch 3. Thanks for the suggestion. > Thanks, > Yury