Received: by 2002:a05:7412:e794:b0:fa:551:50a7 with SMTP id o20csp3016188rdd; Sat, 13 Jan 2024 11:12:09 -0800 (PST) X-Google-Smtp-Source: AGHT+IHfgm2kxUyOaA46lq8Ngmrkjqg+Q4eKs+skGUkNk1+rDrACRUYiWSAhoY5nNZHLKxKzh3P3 X-Received: by 2002:a05:6214:1c4a:b0:681:30a6:dbcf with SMTP id if10-20020a0562141c4a00b0068130a6dbcfmr4130202qvb.49.1705173129279; Sat, 13 Jan 2024 11:12:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1705173129; cv=none; d=google.com; s=arc-20160816; b=YURwZeavSQJLIrp9OFha6H3/8miyklrukHCfr36Zyllt62EXg2NsxalBPVkvSPTlKs +yc3FhwcsUpgvA3lXWccYKYoUEhW6IKbgKz5waefAq7yPgjCuj1YTkV7Q98zdVIMqXU5 VCyoIpbVSUhe9PL0FQg+nXiaMapdUwyxDccVkNDN+eAE0z3v41g4rQL3/0iocr7rRFWv WKo/X08a6mCyqiSl3527oeimhCy3mNU7RmgieepucCdYt6l+fY59TINlvMOjI1I4fnSe cs7ia+MX4ZkjLTmWsVG53W++MebgUtKaCxdJA+I0ug1PVofmWMJQIAJoA/FcmP62NQaq ku+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=lZUYQq6ZeHPwpQSptoecLeEtOwBdqH6kWgYuNYaqluQ=; fh=/PQITZO7/DuWDa30FHDpnEunVwNOLX2OSz8iRGhdWRk=; b=zlJHwP8sQhCgK8nScuHl8CPfH7F0mB2bpoVHeBlQkAXVyNPT+x/EuuR8C3brZoh8NP DjTq+EzLleNKsE8T/TDM8NV7axLfMi2UWrjVaDFuZQ26VhOTFxNJfE7qJx6NDKV70wbM MNiI+Uiu8GquCAT6RuWph6/4X3MmlcbAUTk2wE24DSXRMXsz+CtcC9RpyF+7UK4CYsa6 zSJL+Z0qjtG9g8SDDqavUosYwuggr69uCNowQ0hYod0fucGQh+XkMF5LVwQCJ/nnaRdW eNTLVzh+ZJy4ebke/urNkqr+b0/gtZqKNlBUYPmE9QBAps8xTuRwEgIuN9qERZkUMKEC eLiw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=JilLmU2u; spf=pass (google.com: domain of linux-kernel+bounces-25360-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-25360-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id t9-20020a05620a0b0900b007833f1d19a8si4992566qkg.569.2024.01.13.11.12.09 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Jan 2024 11:12:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-25360-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=JilLmU2u; spf=pass (google.com: domain of linux-kernel+bounces-25360-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-25360-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id BFBB41C20FEB for ; Sat, 13 Jan 2024 19:12:08 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3350C6AAB; Sat, 13 Jan 2024 19:11:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="JilLmU2u" Received: from mail-yw1-f173.google.com (mail-yw1-f173.google.com [209.85.128.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C10F3610D; Sat, 13 Jan 2024 19:11:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yw1-f173.google.com with SMTP id 00721157ae682-5e784ce9bb8so60024657b3.0; Sat, 13 Jan 2024 11:11:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705173111; x=1705777911; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=lZUYQq6ZeHPwpQSptoecLeEtOwBdqH6kWgYuNYaqluQ=; b=JilLmU2uWeT6vQscQbOpFh9/im8K4J2IxSt/IfvvVHl6UhxteD9sFiEOPmG+FpnsSI qNDR6LIgTjU5uKhZDpUmjdlSUUVsnokSSLgwuYtAv4346kLZGvQI/Cx8/7xz2787x6pt fx65mwEPajoEKpOPTzwA4SK0XJXLpc6f7sp/LuvK+ekRl/F5XFW107QRSnk3uOU1FLOf OmADZEfM1cyK8MFxi/DIGfWKlZgwrOc/F6m6kqA23E1Ni/HFer9zy9phU7tnl8gABYP3 ISlpu8v83y3GV8oZ7KsHM52e1RuBKlbIJi3iL/WrwswKl/q+IaeJ6fOruyjcOu/nYa20 qOYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705173111; x=1705777911; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=lZUYQq6ZeHPwpQSptoecLeEtOwBdqH6kWgYuNYaqluQ=; b=EG6Jps995Cs8SKVkkGNrEHGUzIFn0l6rofz4j5lrqL+HzFj/u5PhndwW90qPFkMpMh mj4zS6fEEFH3eyToU9jIbK8Bw78ShYTbeCLnfdIBL50J/MPVQQ+P1qvN9+wqBxrUePaA U/KcujxWjmB9DFBauHfduNxoRzBlW9B4XncuN8HJ81+b9fyJHAGCzGvVrx80kUEDkt2C 9NM7A3X2M0hmTlJI7s64pzOLIhjLw5cR+DlZgaPmNvsPFMmH7j8WUYMUOPZHBjkrPyUI uG2vRW0zVQz4oFY+XGlT9EuuPXWdIu62P/YnY3EJQvqrPWf9ygVlY5wtSNTQ5WYQRKCg xjtw== X-Gm-Message-State: AOJu0YyHQjgJSDPqA8L32hHJtN+1lzFyULP1D4FCsy13cGg3bIJxhcif bE6/gIajgnl+uXdgm8hZlDu/Bt4Jn2pYsw== X-Received: by 2002:a81:5482:0:b0:5ef:f4a4:2f4 with SMTP id i124-20020a815482000000b005eff4a402f4mr2624249ywb.12.1705173111555; Sat, 13 Jan 2024 11:11:51 -0800 (PST) Received: from localhost ([2601:344:8301:57f0:33b2:8901:1c90:18a2]) by smtp.gmail.com with ESMTPSA id v141-20020a814893000000b005fb420b14dcsm2346352ywa.108.2024.01.13.11.11.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Jan 2024 11:11:50 -0800 (PST) Date: Sat, 13 Jan 2024 11:11:50 -0800 From: Yury Norov To: Michael Kelley Cc: Souradeep Chakrabarti , Haiyang Zhang , KY Srinivasan , "wei.liu@kernel.org" , Dexuan Cui , "davem@davemloft.net" , "edumazet@google.com" , "kuba@kernel.org" , "pabeni@redhat.com" , Long Li , "leon@kernel.org" , "cai.huoqing@linux.dev" , "ssengar@linux.microsoft.com" , "vkuznets@redhat.com" , "tglx@linutronix.de" , "linux-hyperv@vger.kernel.org" , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-rdma@vger.kernel.org" , Souradeep Chakrabarti , Paul Rosswurm Subject: Re: [PATCH 3/4 net-next] net: mana: add a function to spread IRQs per CPUs Message-ID: References: <1704797478-32377-1-git-send-email-schakrabarti@linux.microsoft.com> <1704797478-32377-4-git-send-email-schakrabarti@linux.microsoft.com> <20240111061319.GC5436@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> <20240113063038.GD5436@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Sat, Jan 13, 2024 at 04:20:31PM +0000, Michael Kelley wrote: > From: Souradeep Chakrabarti Sent: Friday, January 12, 2024 10:31 PM > > > On Fri, Jan 12, 2024 at 06:30:44PM +0000, Haiyang Zhang wrote: > > > > > > > -----Original Message----- > > > From: Michael Kelley Sent: Friday, January 12, 2024 11:37 AM > > > > > > > > From: Souradeep Chakrabarti Sent: > > > > Wednesday, January 10, 2024 10:13 PM > > > > > > > > > > The test topology was used to check the performance between > > > > > cpu_local_spread() and the new approach is : > > > > > Case 1 > > > > > IRQ Nodes Cores CPUs > > > > > 0 1 0 0-1 > > > > > 1 1 1 2-3 > > > > > 2 1 2 4-5 > > > > > 3 1 3 6-7 > > > > > > > > > > and with existing cpu_local_spread() > > > > > Case 2 > > > > > IRQ Nodes Cores CPUs > > > > > 0 1 0 0 > > > > > 1 1 0 1 > > > > > 2 1 1 2 > > > > > 3 1 1 3 > > > > > > > > > > Total 4 channels were used, which was set up by ethtool. > > > > > case 1 with ntttcp has given 15 percent better performance, than > > > > > case 2. During the test irqbalance was disabled as well. > > > > > > > > > > Also you are right, with 64CPU system this approach will spread > > > > > the irqs like the cpu_local_spread() but in the future we will offer > > > > > MANA nodes, with more than 64 CPUs. There it this new design will > > > > > give better performance. > > > > > > > > > > I will add this performance benefit details in commit message of > > > > > next version. > > > > > > > > Here are my concerns: > > > > > > > > 1. The most commonly used VMs these days have 64 or fewer > > > > vCPUs and won't see any performance benefit. > > > > > > > > 2. Larger VMs probably won't see the full 15% benefit because > > > > all vCPUs in the local NUMA node will be assigned IRQs. For > > > > example, in a VM with 96 vCPUs and 2 NUMA nodes, all 48 > > > > vCPUs in NUMA node 0 will all be assigned IRQs. The remaining > > > > 16 IRQs will be spread out on the 48 CPUs in NUMA node 1 > > > > in a way that avoids sharing a core. But overall the means > > > > that 75% of the IRQs will still be sharing a core and > > > > presumably not see any perf benefit. > > > > > > > > 3. Your experiment was on a relatively small scale: 4 IRQs > > > > spread across 2 cores vs. across 4 cores. Have you run any > > > > experiments on VMs with 128 vCPUs (for example) where > > > > most of the IRQs are not sharing a core? I'm wondering if > > > > the results with 4 IRQs really scale up to 64 IRQs. A lot can > > > > be different in a VM with 64 cores and 2 NUMA nodes vs. > > > > 4 cores in a single node. > > > > > > > > 4. The new algorithm prefers assigning to all vCPUs in > > > > each NUMA hop over assigning to separate cores. Are there > > > > experiments showing that is the right tradeoff? What > > > > are the results if assigning to separate cores is preferred? > > > > > > I remember in a customer case, putting the IRQs on the same > > > NUMA node has better perf. But I agree, this should be re-tested > > > on MANA nic. > > > > 1) and 2) The change will not decrease the existing performance, but for > > system with high number of CPU, will be benefited after this. > > > > 3) The result has shown around 6 percent improvement. > > > > 4)The test result has shown around 10 percent difference when IRQs are > > spread on multiple numa nodes. > > OK, this looks pretty good. Make clear in the commit messages what > the tradeoffs are, and what the real-world benefits are expected to be. > Some future developer who wants to understand why IRQs are assigned > this way will thank you. :-) I agree with Michael, this needs to be spoken aloud. From the above, is that correct that the best performance is achieved when the # of IRQs is half the nubmer of CPUs in the 1st node, because this configuration allows to spread IRQs across cores the most optimal way? And if we have more or less than that, it hurts performance, at least for MANA networking? So, the B|A performance chart may look like this, right? irq nodes cores cpus perf 0 1 | 1 0 | 0 0 | 0-1 0% 1 1 | 1 0 | 1 1 | 2-3 +5% 2 1 | 1 1 | 2 2 | 4-5 +10% 3 1 | 1 1 | 3 3 | 6-7 +15% 4 1 | 1 0 | 4 3 | 0-1 +12% ... | | | 7 1 | 1 1 | 7 3 | 6-7 0% ... 15 2 | 2 3 | 3 15 | 14-15 0% Souradeep, can you please confirm that my understanding is correct? In v5, can you add a table like the above with real performance numbers for your driver? I think that it would help people to configure their VMs better when networking is a bottleneck. Thanks, Yury