Received: by 2002:a05:6358:c692:b0:131:369:b2a3 with SMTP id fe18csp1297730rwb; Wed, 26 Jul 2023 10:15:41 -0700 (PDT) X-Google-Smtp-Source: APBJJlG07XeVmDerOuU0VaRPzXBF94qKz+XZ8pmYfdg+G+HJVvKDEPQt/m0KFseHOlx/dX8graxr X-Received: by 2002:a2e:9a94:0:b0:2b7:1dd:b416 with SMTP id p20-20020a2e9a94000000b002b701ddb416mr2324272lji.15.1690391741392; Wed, 26 Jul 2023 10:15:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690391741; cv=none; d=google.com; s=arc-20160816; b=NKyp7ilFq9eGbu5i6cW5TnV0yfhA1ypg3UzVXlGTwC16y1nc1XPXSdNa9ROZJenLpE S7pyyssAS/QmMs9tPlSVBX/xhQoGG3mmt+Z/421/gTBIi6au9XCDGLyslO4m9nHrf6Y6 6+2+Z6w3/e+SsmMcu0FyVepMiId3C6Q13V5Uru83LZLjBEZIVsbnNj56dZifCvSDSPZh BbNCiBIUdH2zCmIL4Nc2jsN00pBrdhhdhJ7L1HtxwesC9JcGUmoJ4iZtZEpwL2REWRbz RO9ybK0UWEM8+akD70jfk+tGtQJmI27bSSquXDv0+P5SX9nIFgmg5cgO03l9pydz1hNT MAKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=4d2s+iN0hU5ElA/lT3SolsnTIYmwNJk52FdrvGPrf6U=; fh=XxLm/NPoEew+fSWZKOqk6AsEq2tEzpFUEandXkOBknI=; b=x3w72hhqNKmz97Qy3Qz3QhfVJSpwaPY01hJC4wQjt+w9T4aBr9rbw3Du3qFZeM6/6P REvQ4HFkcbhKxKOC/+DArVSSCHL+IqVWMfDKV8zJHj1KE1C5chdZhp6vXz/fJjHbkcEG zgLuUarO6JRcB4LbfESjKitmdwky+MndrrWNcUvSCLzI6NZZnZuf7CkMPkbIZp8VJ8j1 6t+DE9PVHzdoUz36J5rHLjrtYkDIBzc2vWQ2551yunfImeg8zGbFZO8U2AlVxyKYpbN3 vskQquJ9sSLP8L3MI9rtQGvhS9cIP652OMJQnPPzGMK5u6tIMgTrn9VPsD9OOImgz2GD +K6w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=q06Z4ihI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gv6-20020a170906f10600b009929280d0dfsi9855252ejb.434.2023.07.26.10.15.15; Wed, 26 Jul 2023 10:15:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=q06Z4ihI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232956AbjGZQR1 (ORCPT + 99 others); Wed, 26 Jul 2023 12:17:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41828 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232744AbjGZQR0 (ORCPT ); Wed, 26 Jul 2023 12:17:26 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A72F268C for ; Wed, 26 Jul 2023 09:17:24 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id DE47961BB3 for ; Wed, 26 Jul 2023 16:17:23 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D4A53C433C8; Wed, 26 Jul 2023 16:17:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1690388243; bh=EzXddJxdLCYjjaFLuW2hCq3oDwt9vioK1ChWO2Mwx8s=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=q06Z4ihIpUy+7SKUtFzFVavPTBeQavAX3BgEqflbadf8w+4hCjOuVpbKAYKUaB7Lf nqzj8nF+wZ250iGYlYL5rhE7bqDiSswYvwgKkt1LyBKXSeQil04tT7FzKMiegLL0E7 xPBSCs31s9usjL9nVoqgmcsvx77Hwd9p25c4ld5ErkeWHTjjAKTeSXhS9sYDxYphnF OZi+jNvnGBGXY118qbP/jkJJ4raskWDJTPiyxcnBPHerXvK5EjbUPsOd1imz4gvF8L tjPz9mtKR26a8HOAego60n7FvpCH2FZ4dxPdUbnXljxRl2So+rX/84Y7TP8oAJ52eb Q7xM3XoMMwCXw== Date: Wed, 26 Jul 2023 10:17:20 -0600 From: Keith Busch To: Pratyush Yadav Cc: Christoph Hellwig , Sagi Grimberg , Jens Axboe , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] nvme-pci: do not set the NUMA node of device if it has none Message-ID: References: <20230725110622.129361-1-ptyadav@amazon.de> <50a125da-95c8-3b9b-543a-016c165c745d@grimberg.me> <20230726131408.GA15909@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 26, 2023 at 05:30:33PM +0200, Pratyush Yadav wrote: > On Wed, Jul 26 2023, Christoph Hellwig wrote: > > On Wed, Jul 26, 2023 at 10:58:36AM +0300, Sagi Grimberg wrote: > >>>> For example, AWS EC2's i3.16xlarge instance does not expose NUMA > >>>> information for the NVMe devices. This means all NVMe devices have > >>>> NUMA_NO_NODE by default. Without this patch, random 4k read performance > >>>> measured via fio on CPUs from node 1 (around 165k IOPS) is almost 50% > >>>> less than CPUs from node 0 (around 315k IOPS). With this patch, CPUs on > >>>> both nodes get similar performance (around 315k IOPS). > >>> > >>> irqbalance doesn't work with this driver though: the interrupts are > >>> managed by the kernel. Is there some other reason to explain the perf > >>> difference? > > Hmm, I did not know that. I have not gone and looked at the code but I > think the same reasoning should hold, just with s/irqbalance/kernel. If > the kernel IRQ balancer sees the device is on node 0, it would deliver > its interrupts to CPUs on node 0. > > In my tests I can see that the interrupts for NVME queues are sent only > to CPUs from node 0 without this patch. With this patch CPUs from both > nodes get the interrupts. Could you send the output of: numactl --hardware and then with and without your patch: for i in $(cat /proc/interrupts | grep nvme0 | sed "s/^ *//g" | cut -d":" -f 1); do \ cat /proc/irq/$i/{smp,effective}_affinity_list; \ done ?