Received: by 2002:a05:7412:e794:b0:fa:551:50a7 with SMTP id o20csp853937rdd; Wed, 10 Jan 2024 01:11:39 -0800 (PST) X-Google-Smtp-Source: AGHT+IGYwz6/NApgJc7JxGBbXqErtan8aTkuXA4ZG7raaPAslGjB+ZHYdtaD6Qb2Far2Z6bO4DY1 X-Received: by 2002:a17:90a:ca8d:b0:28c:9781:d91c with SMTP id y13-20020a17090aca8d00b0028c9781d91cmr343346pjt.62.1704877899067; Wed, 10 Jan 2024 01:11:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704877899; cv=none; d=google.com; s=arc-20160816; b=DXe7shJXsN0v8vbbilwhoAFXxBrIoSyPSS6P9A3TCwrGaBLpUF+hDl20mJobjMAgS5 IGFE3VHXlUb54CK7oht7hKmj6Pi7XJspr00XWC3Yvr+caG2HUNbyoF/A3Qi/AOnHhYMb EPh8B5hO2ih8Z7MoiyDad0xaBfn1L/PHzTmwLUpR/PG86xZzJ4SBfVxmf13flTdkjtRe nWJadEa84R32F+sjCbQuqn03m/0ATNPrxXbh0hpLGZOaFCwC/V8UdfOrSi8bN+2Id27B v+PldfGtnul4EnqR8ZFYVU/EmSAcEeFWba4LQH+DF62FQ5vxv+xejaD6ysd4ZI9QODdt 8u5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:references :message-id:subject:cc:to:from:date:dkim-signature:dkim-filter; bh=Qpi7R7m3O+gdQIXeh5+h30ohlFKpfJGH9jVSMmph+NQ=; fh=xM+qRCR19jkdMmKmZvEV6q3GUgwP9gm2M11iOwCCDB4=; b=fj1HP5dREXagt+JDobENR0ugzgwDtDJb3ooqFVorDz5XSwheKyshWYsMw+QegMHvZz y5hm2KkbWcrRd69gi+RBgu+RrOfjTsZg+ZQRFn1yT0Alp075Oyv8iwdIrMfdhZC+6rEy uSpULmeEtAzlSK2p2DbGsrco8Hii0HFQCeDWw9nEy1RbLi8oa3I4v34qlLq4cBMIu7c8 vf/4oE8RBaQa1VgCAUKr3K8ZrPX7U+nG+5E2xLxM94ODL00cSmnkemhk+7gmqmQfP1gz fRJu2iMt7wxZ/nC4xfPq0rklOXRRgn/eVDpbgfXfzE+z4+vUm+cbO0GjgwkOBf71/Hzo c2zQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=DP3RQn8U; spf=pass (google.com: domain of linux-kernel+bounces-21879-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-21879-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id nv2-20020a17090b1b4200b0028bd40445bbsi1026034pjb.139.2024.01.10.01.11.38 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Jan 2024 01:11:39 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-21879-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=DP3RQn8U; spf=pass (google.com: domain of linux-kernel+bounces-21879-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-21879-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id C0478B25439 for ; Wed, 10 Jan 2024 09:09:50 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E628F3C48C; Wed, 10 Jan 2024 09:09:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="DP3RQn8U" Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 967193B780; Wed, 10 Jan 2024 09:09:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Received: by linux.microsoft.com (Postfix, from userid 1099) id 3E43220B3CC1; Wed, 10 Jan 2024 01:09:31 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 3E43220B3CC1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1704877771; bh=Qpi7R7m3O+gdQIXeh5+h30ohlFKpfJGH9jVSMmph+NQ=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=DP3RQn8UQ6QZwWzhzOHllflTcYyMB1LSjXmplQT+aEN1cYfPiXn0GbmCDSB/Qzwyo h81vxTkaCx/tFW7un6bYYzBi5vrh55NIcr2RyQ7b8ktsL4+9g0TmmQn9TXQZ8WH8Ld iPikSa8PwX53ln3Z/nHE/aewUoXaLoXQRsmYls98= Date: Wed, 10 Jan 2024 01:09:31 -0800 From: Souradeep Chakrabarti To: Yury Norov Cc: Michael Kelley , "kys@microsoft.com" , "haiyangz@microsoft.com" , "wei.liu@kernel.org" , "decui@microsoft.com" , "davem@davemloft.net" , "edumazet@google.com" , "kuba@kernel.org" , "pabeni@redhat.com" , "longli@microsoft.com" , "leon@kernel.org" , "cai.huoqing@linux.dev" , "ssengar@linux.microsoft.com" , "vkuznets@redhat.com" , "tglx@linutronix.de" , "linux-hyperv@vger.kernel.org" , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-rdma@vger.kernel.org" , "schakrabarti@microsoft.com" , "paulros@microsoft.com" Subject: Re: [PATCH 3/4 net-next] net: mana: add a function to spread IRQs per CPUs Message-ID: <20240110090931.GB5436@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> References: <1704797478-32377-1-git-send-email-schakrabarti@linux.microsoft.com> <1704797478-32377-4-git-send-email-schakrabarti@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) On Tue, Jan 09, 2024 at 03:28:59PM -0800, Yury Norov wrote: > Hi Michael, > > So, I'm just a guy who helped to formulate the heuristics in an > itemized form, and implement them using the existing kernel API. > I have no access to MANA machines and I ran no performance tests > myself. > > On Tue, Jan 09, 2024 at 07:22:38PM +0000, Michael Kelley wrote: > > From: Souradeep Chakrabarti Sent: Tuesday, January 9, 2024 2:51 AM > > > > > > From: Yury Norov > > > > > > Souradeep investigated that the driver performs faster if IRQs are > > > spread on CPUs with the following heuristics: > > > > > > 1. No more than one IRQ per CPU, if possible; > > > 2. NUMA locality is the second priority; > > > 3. Sibling dislocality is the last priority. > > > > > > Let's consider this topology: > > > > > > Node 0 1 > > > Core 0 1 2 3 > > > CPU 0 1 2 3 4 5 6 7 > > > > > > The most performant IRQ distribution based on the above topology > > > and heuristics may look like this: > > > > > > IRQ Nodes Cores CPUs > > > 0 1 0 0-1 > > > 1 1 1 2-3 > > > 2 1 0 0-1 > > > 3 1 1 2-3 > > > 4 2 2 4-5 > > > 5 2 3 6-7 > > > 6 2 2 4-5 > > > 7 2 3 6-7 > > > > I didn't pay attention to the detailed discussion of this issue > > over the past 2 to 3 weeks during the holidays in the U.S., but > > the above doesn't align with the original problem as I understood > > it. I thought the original problem was to avoid putting IRQs on > > both hyper-threads in the same core, and that the perf > > improvements are based on that configuration. At least that's > > what the commit message for Patch 4/4 in this series says. > > Yes, and the original distribution suggested by Souradeep looks very > similar: > > IRQ Nodes Cores CPUs > 0 1 0 0 > 1 1 1 2 > 2 1 0 1 > 3 1 1 3 > 4 2 2 4 > 5 2 3 6 > 6 2 2 5 > 7 2 3 7 > > I just added a bit more flexibility, so that kernel may pick any > sibling for the IRQ. As I understand, both approaches have similar > performance. Probably my fine-tune added another half-percent... > > Souradeep, can you please share the exact numbers on this? > > > The above chart results in 8 IRQs being assigned to the 8 CPUs, > > probably with 1 IRQ per CPU. At least on x86, if the affinity > > mask for an IRQ contains multiple CPUs, matrix_find_best_cpu() > > should balance the IRQ assignments between the CPUs in the mask. > > So the original problem is still present because both hyper-threads > > in a core are likely to have an IRQ assigned. > > That's what I think, if the topology makes us to put IRQs in the > same sibling group, the best thing we can to is to rely on existing > balancing mechanisms in a hope that they will do their job well. > > > Of course, this example has 8 IRQs and 8 CPUs, so assigning an > > IRQ to every hyper-thread may be the only choice. If that's the > > case, maybe this just isn't a good example to illustrate the > > original problem and solution. > > Yeah... This example illustrates the order of IRQ distribution. > I really doubt that if we distribute IRQs like in the above example, > there would be any difference in performance. But I think it's quite > a good illustration. I could write the title for the table like this: > > The order of IRQ distribution for the best performance > based on [...] may look like this. > > > But even with a better example > > where the # of IRQs is <= half the # of CPUs in a NUMA node, > > I don't think the code below accomplishes the original intent. > > > > Maybe I've missed something along the way in getting to this > > version of the patch. Please feel free to set me straight. :-) > > Hmm. So if the number of IRQs is the half # of CPUs in the nodes, > which is 2 in the example above, the distribution will look like > this: > > IRQ Nodes Cores CPUs > 0 1 0 0-1 > 1 1 1 2-3 > > And each IRQ belongs to a different sibling group. This follows > the rules above. > > I think of it like we assign an IRQ to a group of 2 CPUs, so from > the heuristic #1 perspective, each CPU is assigned with 1/2 of the > IRQ. > > If I add one more IRQ, then according to the heuristics, NUMA locality > trumps sibling dislocality, so we'd assign IRO to the same node on any > core. My algorithm assigns it to the core #0: > > 2 1 0 0-1 > > This doubles # of IRQs for the CPUs 0 and 1: from 1/2 to 1. > > The next IRQ should be assigned to the same node again, and we've got > the only choice: > > > 3 1 1 2-3 > > Starting from IRQ #5, the node #1 is full - each CPU is assigned with > exactly one IRQ, and the heuristic #1 makes us to switch to the other > node; and then do the same thing: > > 4 2 2 4-5 > 5 2 3 6-7 > 6 2 2 4-5 > 7 2 3 6-7 > > So I think the algorithm is correct... Really hope the above makes > sense. :) If so, I can add it to the commit message for patch #3. > > Nevertheless... Souradeep, in addition to the performance numbers, can > you share your topology and actual IRQ distribution that gains 15%? I > think it should be added to the patch #4 commit message. Sure I will add my topology in #4 commit message. Thanks for the suggestion. > > Thanks, > Yury