Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp251549rdb; Thu, 30 Nov 2023 04:05:57 -0800 (PST) X-Google-Smtp-Source: AGHT+IHXRDVJe3jZd2rchr7xthWQV0H+C+1pV+48py+QJkrD0P+amPpNgmmmvEmFh53/93WfUGnW X-Received: by 2002:a05:6808:3a8a:b0:3b8:9831:5ca2 with SMTP id fb10-20020a0568083a8a00b003b898315ca2mr4508835oib.21.1701345957534; Thu, 30 Nov 2023 04:05:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701345957; cv=none; d=google.com; s=arc-20160816; b=lOQT/ZACeHEGQ84NP13v1cNu/KaPFYQW2ht9u3lTtIIGaW/pu8b9kW9b8hsW0FXXZg 3A2+ovY0oyLRGqKLP3KINADgf3klj2SwwOnBYkLREFUEBiDbHI4WTKzHNJ7267cvEwj6 F7HqWklQJ61kEsI5eoBewteBXL2+ridMvhxT22RBiLTeNnczRc9Iy1xZ70uePlXySXfk AR6P2qdgDEx0LaERZLZsH8/aiEABEWeIs2AK3BBDbjgfZh4j6xu4XPLn8XdDLk5MlXPD BOnqBaT6ef2zmSBBq26jNp32TgHZn+WEq0C49HJhnk95pmvBl8nQTDHF3OUhACbe2taY dxCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature:dkim-filter; bh=/rVuqlYYpLzwGHq/EK9NytP0xWQESkkc0TlDS3zxqY4=; fh=WpNKx+Bhs0RA6JTH1tME6giulfTFCf2iy1Dop4BoWv0=; b=qR7UKgilvknfUnmEY/i8p5oUQ5aIhp2KvgarhDNQLjei5f+/tliLd//amsqP8KKdDt G9GwSpfLKRVvEDqbLis7gYZi7EO5sKx8QZ9/B0i0WOzr70IAHI9dxwgxb5v+y/uFY7om zCiS05qSWaMJIV13L/kPzXiO9BYs6VHwFfvUrOwDdkJOQhL5BBEDukoOYKj3VjRN9WOO X3CLNM6NYAlPPrp+eTIaK604+wI9jaWmrkep+Hk5pm+WMRjOFvkI1naIYs3Hkj7nTIkz 15WK2kgTFkZwKa5+XBNRfMbuVmO1FTBBOcAy7G5R0xYTow4BkPVvKAvMLVvqhMT+AeDV ffcg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b="Bpt7p/4H"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id s21-20020a63d055000000b005be3c09abf1si1215712pgi.397.2023.11.30.04.05.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 04:05:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b="Bpt7p/4H"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id CB9F18061B69; Thu, 30 Nov 2023 04:05:30 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345345AbjK3MFP (ORCPT + 99 others); Thu, 30 Nov 2023 07:05:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49198 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345278AbjK3MFH (ORCPT ); Thu, 30 Nov 2023 07:05:07 -0500 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id CB16BD46; Thu, 30 Nov 2023 04:05:12 -0800 (PST) Received: by linux.microsoft.com (Postfix, from userid 1099) id 1AF5C20B74C0; Thu, 30 Nov 2023 04:05:12 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 1AF5C20B74C0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1701345912; bh=/rVuqlYYpLzwGHq/EK9NytP0xWQESkkc0TlDS3zxqY4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Bpt7p/4HhHDo6dYClHRLdHCut/FjhKTk6mGeZB+9C5QBTqhDefLDJZszOOxWnf74K khFcWy3czqcJIXDdbr6llkD8Ic6HzwBpTBpnCLFwf3UXFQ2bOstrSz+QR8JwTmoGnO jGgGYdAqsWiKEO2EKytfiLc+FYSb9ihaSt+rwXXc= Date: Thu, 30 Nov 2023 04:05:12 -0800 From: Souradeep Chakrabarti To: Yury Norov Cc: Souradeep Chakrabarti , Jakub Kicinski , KY Srinivasan , Haiyang Zhang , "wei.liu@kernel.org" , Dexuan Cui , "davem@davemloft.net" , "edumazet@google.com" , "pabeni@redhat.com" , Long Li , "sharmaajay@microsoft.com" , "leon@kernel.org" , "cai.huoqing@linux.dev" , "ssengar@linux.microsoft.com" , "vkuznets@redhat.com" , "tglx@linutronix.de" , "linux-hyperv@vger.kernel.org" , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-rdma@vger.kernel.org" , Paul Rosswurm Subject: Re: [EXTERNAL] Re: [PATCH V2 net-next] net: mana: Assigning IRQ affinity on HT cores Message-ID: <20231130120512.GA15408@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> References: <1700574877-6037-1-git-send-email-schakrabarti@linux.microsoft.com> <20231121154841.7fc019c8@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-17.5 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_PASS,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL, USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Thu, 30 Nov 2023 04:05:30 -0800 (PST) On Wed, Nov 29, 2023 at 06:16:17PM -0800, Yury Norov wrote: > On Mon, Nov 27, 2023 at 09:36:38AM +0000, Souradeep Chakrabarti wrote: > > > > > > >-----Original Message----- > > >From: Jakub Kicinski > > >Sent: Wednesday, November 22, 2023 5:19 AM > > >To: Souradeep Chakrabarti > > >Cc: KY Srinivasan ; Haiyang Zhang > > >; wei.liu@kernel.org; Dexuan Cui > > >; davem@davemloft.net; edumazet@google.com; > > >pabeni@redhat.com; Long Li ; > > >sharmaajay@microsoft.com; leon@kernel.org; cai.huoqing@linux.dev; > > >ssengar@linux.microsoft.com; vkuznets@redhat.com; tglx@linutronix.de; linux- > > >hyperv@vger.kernel.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; > > >linux-rdma@vger.kernel.org; Souradeep Chakrabarti > > >; Paul Rosswurm > > >Subject: [EXTERNAL] Re: [PATCH V2 net-next] net: mana: Assigning IRQ affinity on > > >HT cores > > > > > >On Tue, 21 Nov 2023 05:54:37 -0800 Souradeep Chakrabarti wrote: > > >> Existing MANA design assigns IRQ to every CPUs, including sibling > > >> hyper-threads in a core. This causes multiple IRQs to work on same CPU > > >> and may reduce the network performance with RSS. > > >> > > >> Improve the performance by adhering the configuration for RSS, which > > >> assigns IRQ on HT cores. > > > > > >Drivers should not have to carry 120 LoC for something as basic as spreading IRQs. > > >Please take a look at include/linux/topology.h and if there's nothing that fits your > > >needs there - add it. That way other drivers can reuse it. > > Because of the current design idea, it is easier to keep things inside > > the mana driver code here. As the idea of IRQ distribution here is : > > 1)Loop through interrupts to assign CPU > > 2)Find non sibling online CPU from local NUMA and assign the IRQs > > on them. > > 3)If number of IRQs is more than number of non-sibling CPU in that > > NUMA node, then assign on sibling CPU of that node. > > 4)Keep doing it till all the online CPUs are used or no more IRQs. > > 5)If all CPUs in that node are used, goto next NUMA node with CPU. > > Keep doing 2 and 3. > > 6) If all CPUs in all NUMA nodes are used, but still there are IRQs > > then wrap over from first local NUMA node and continue > > doing 2, 3 4 till all IRQs are assigned. > > Hi Souradeep, > > (Thanks Jakub for sharing this thread with me) > > If I understand your intention right, you can leverage the existing > cpumask_local_spread(). > > But I think I've got something better for you. The below series adds > a for_each_numa_cpu() iterator, which may help you doing most of the > job without messing with nodes internals. > > https://lore.kernel.org/netdev/ZD3l6FBnUh9vTIGc@yury-ThinkPad/T/ > Thanks Yur and Jakub. I was trying to find this patch, but unable to find it on that thread. Also in net-next I am unable to find it. Can you please tell, if it has been committed? If not can you please point me out the correct patch for this macro. It will be really helpful. > By using it, the pseudocode implementing your algorithm may look > like this: > > unsigned int cpu, hop; > unsigned int irq = 0; > > again: > cpu = get_cpu(); > node = cpu_to_node(cpu); > cpumask_copy(cpus, cpu_online_mask); > > for_each_numa_cpu(cpu, hop, node, cpus) { > /* All siblings are the same for IRQ spreading purpose */ > irq_set_affinity_and_hint(irq, topology_sibling_cpumask()); > > /* One IRQ per sibling group */ > cpumask_andnot(cpus, cpus, topology_sibling_cpumask()); > > if (++irq == num_irqs) > break; > } > > if (irq < num_irqs) > goto again; > > (Completely not tested, just an idea.) > I have done similar kind of change for our driver, but constraint here is that total number of IRQs can be equal to the total number of online CPUs, in some setup. It is either equal to the number of online CPUs or maximum 64 IRQs if online CPUs are more than that. So my proposed change is following: +static int irq_setup(int *irqs, int nvec, int start_numa_node) +{ + cpumask_var_t node_cpumask; + int i, cpu, err = 0; + unsigned int next_node; + cpumask_t visited_cpus; + unsigned int start_node = start_numa_node; + i = 0; + if (!alloc_cpumask_var(&node_cpumask, GFP_KERNEL)) { + err = -ENOMEM; + goto free_mask; + } + cpumask_andnot(&visited_cpus, &visited_cpus, &visited_cpus); + start_node = 1; + for_each_next_node_with_cpus(start_node, next_node) { + cpumask_copy(node_cpumask, cpumask_of_node(next_node)); + for_each_cpu(cpu, node_cpumask) { + cpumask_andnot(node_cpumask, node_cpumask, + topology_sibling_cpumask(cpu)); + irq_set_affinity_and_hint(irqs[i], cpumask_of(cpu)); + if(++i == nvec) + goto free_mask; + cpumask_set_cpu(cpu, &visited_cpus); + if (cpumask_empty(node_cpumask) && cpumask_weight(&visited_cpus) < + nr_cpus_node(next_node)) { + cpumask_copy(node_cpumask, cpumask_of_node(next_node)); + cpumask_andnot(node_cpumask, node_cpumask, &visited_cpus); + cpu = cpumask_first(node_cpumask); + } + } + if (next_online_node(next_node) == MAX_NUMNODES) + next_node = first_online_node; + } +free_mask: + free_cpumask_var(node_cpumask); + return err; +} I can definitely use the for_each_numa_cpu() instead of my proposed for_each_next_node_with_cpus() macro here and that will make it cleaner. Thanks for the suggestion. > Thanks, > Yury