Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp285250imm; Thu, 30 Aug 2018 23:55:52 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYcJYhHvEORo5fVVjJsXxO9EqpnxhX844a/IQja0dJS8Ln4qQChVoQzqIDSaQOkGn9eEhku X-Received: by 2002:a65:448c:: with SMTP id l12-v6mr13205171pgq.277.1535698552588; Thu, 30 Aug 2018 23:55:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535698552; cv=none; d=google.com; s=arc-20160816; b=tsyFFi95w2LpBsk+g81juEMHLhNvULed1ncwV6G2shSmhaz0QR9sYHl60NX9Cyk9nz tERF/8Do0T3nqSuUHmZgL2tYsIvH6TM1CVb+c+ojL7ob3wH3iq0ILVAmSOA9zUnJMF8L mMBJcoDcvrooAPJUjS2AdkEY6Ghce6nka6oW/tP4+lep7fwJvfYetWgENpn5zXh+k1Fn ExvyVAU6T9tU7hHqLGccNHmL/xKsDx5iDNhAxmBGn6jg/3jc6KmgnrpZuPVDcbQIBZOn jC1VXboRygRtdMpX0wqLMltt16cQflBOs6xWBeL2/Cl5HC7yYQmp0PY70eEsS1/I49GQ J4ng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature :arc-authentication-results; bh=7b849tg0B2AeAQpYcHCJh3ZDwCrGblVrliBrEc/MOio=; b=e+lJN6YQ/fWdyZdkLty5gHoYUS68q5IIEz1clH6mV3up1iHDD9wHLVExogB3Mm4cEn znrYL6NKcQbmmywdWRIkojD+GC/+hTPhuIpzoeffIN5oLMNsUPADeQ5PRI2SFsO4ursq J/jcgkv2N8fOVehrHGk8+IwW7Ckk7U9W0cOXpOYCTKS6gWzB9pIuRjo/xg536vE+VaX3 g8/bpemCoMIWXNWR7vbIssg6VSiSyC6ds2wrG/dUi9RvF3UiNc8jJ4ulu3BQe/VTIRTZ 9stYRvAGVgqN1DXS8Ug85NIqsVmudDSl2YBRH2MWNFrfmXnD2QFTapXSYRgAnw15Cirs KRPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=MjWUPimh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p20-v6si8095830pgk.393.2018.08.30.23.55.36; Thu, 30 Aug 2018 23:55:52 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=MjWUPimh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727318AbeHaLAV (ORCPT + 99 others); Fri, 31 Aug 2018 07:00:21 -0400 Received: from mail-wm0-f54.google.com ([74.125.82.54]:55392 "EHLO mail-wm0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727233AbeHaLAU (ORCPT ); Fri, 31 Aug 2018 07:00:20 -0400 Received: by mail-wm0-f54.google.com with SMTP id f21-v6so4117807wmc.5; Thu, 30 Aug 2018 23:54:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=7b849tg0B2AeAQpYcHCJh3ZDwCrGblVrliBrEc/MOio=; b=MjWUPimhcVuk2IvzHF+0kkrxRKyxtFghA0YdGFG91d4BeFjEfE3HPNVwvFUm7PvzQb YRg5MWjOgNU1K9PkIK1BLlUaAwqglkPjSJ7Gt1Su9c90Tz7f4ePFTwsRyJpe87x0CpOg tvxCMzG2g48rZYsX1B/Yt+nTrx4nSGE8NFHf+ezqVEQ0mywDooNrYFRN9hZGY0u3KZhS r3UDnqWUfEzCZCiKZRJ8X/m5FAjxP2Srb+YcsWD8aDJRD1/adQtMwoV6db7DzrvjWYaD F5Bnly+NA0u0PFTLHTTCMRIJ44BRvKINPlU5AXVPkmoiM7P45Y7auoIBqS8mSNce1GvB sn0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=7b849tg0B2AeAQpYcHCJh3ZDwCrGblVrliBrEc/MOio=; b=clFHDB48mNlY4geZDCFX3Q3zCOH/pgkzz6h7tn9zy0kV0P2aTRyHa/MDX2/0JqEWk6 oWmgi7BiCAoii/MaR04cC6LUjZXirJwz/1snTqg+Byx6pboRbeRko2P3VhiNDnwT3gJU r0TrbAwGuHjIGfCsq0/5n3CejC7rMkqAkIIbbpQw5i7LG3HWTK2sIGhRC2GOv4FOeEeF YWbkp4OtldEbzRXTr5hZvOxXBQ4/6e0fVC4YR1avcyqq/Sa+8nQnul9lcNywNx2OUN/X f3eZlQMrCwBlcyzVHqdgC0m8ralancc8vY24+TkD0/EemnWy02nVzxVctt0885nTVDlb mPSQ== X-Gm-Message-State: APzg51B3/jT+AUIIS/okZH4dDwojhLLSDtdjNgGdS2Z3wHxpXl66jY1l ZqZusXgadgFhsGD9vzv7p8gU3vNQt2Zh+UrV+ArM2xxHh1w= X-Received: by 2002:a1c:aa8f:: with SMTP id t137-v6mr3637304wme.54.1535698460299; Thu, 30 Aug 2018 23:54:20 -0700 (PDT) MIME-Version: 1.0 References: <20180829084618.GA24765@ming.t460p> <300d6fef733ca76ced581f8c6304bac6@mail.gmail.com> In-Reply-To: <300d6fef733ca76ced581f8c6304bac6@mail.gmail.com> From: Ming Lei Date: Fri, 31 Aug 2018 14:54:07 +0800 Message-ID: Subject: Re: Affinity managed interrupts vs non-managed interrupts To: sumit.saxena@broadcom.com Cc: Ming Lei , Thomas Gleixner , Christoph Hellwig , Linux Kernel Mailing List , Kashyap Desai , shivasharan.srikanteshwara@broadcom.com, linux-block Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 29, 2018 at 6:47 PM Sumit Saxena wrote: > > > -----Original Message----- > > From: Ming Lei [mailto:ming.lei@redhat.com] > > Sent: Wednesday, August 29, 2018 2:16 PM > > To: Sumit Saxena > > Cc: tglx@linutronix.de; hch@lst.de; linux-kernel@vger.kernel.org > > Subject: Re: Affinity managed interrupts vs non-managed interrupts > > > > Hello Sumit, > Hi Ming, > Thanks for response. > > > > On Tue, Aug 28, 2018 at 12:04:52PM +0530, Sumit Saxena wrote: > > > Affinity managed interrupts vs non-managed interrupts > > > > > > Hi Thomas, > > > > > > We are working on next generation MegaRAID product where requirement > > > is- to allocate additional 16 MSI-x vectors in addition to number of > > > MSI-x vectors megaraid_sas driver usually allocates. MegaRAID adapter > > > supports 128 MSI-x vectors. > > > > > > To explain the requirement and solution, consider that we have 2 > > > socket system (each socket having 36 logical CPUs). Current driver > > > will allocate total 72 MSI-x vectors by calling API- > > > pci_alloc_irq_vectors(with flag- PCI_IRQ_AFFINITY). All 72 MSI-x > > > vectors will have affinity across NUMA node s and interrupts are > affinity > > managed. > > > > > > If driver calls- pci_alloc_irq_vectors_affinity() with pre_vectors = > > > 16 and, driver can allocate 16 + 72 MSI-x vectors. > > > > Could you explain a bit what the specific use case the extra 16 vectors > is? > We are trying to avoid the penalty due to one interrupt per IO completion > and decided to coalesce interrupts on these extra 16 reply queues. > For regular 72 reply queues, we will not coalesce interrupts as for low IO > workload, interrupt coalescing may take more time due to less IO > completions. > In IO submission path, driver will decide which set of reply queues > (either extra 16 reply queues or regular 72 reply queues) to be picked > based on IO workload. I am just wondering how you can make the decision about using extra 16 or regular 72 queues in submission path, could you share us a bit your idea? How are you going to recognize the IO workload inside your driver? Even the current block layer doesn't recognize IO workload, such as random IO or sequential IO. Frankly speaking, you may reuse the 72 reply queues to do interrupt coalescing by configuring one extra register to enable the coalescing mode, and you may just use small part of the 72 reply queues under the interrupt coalescing mode. Or you can learn from SPDK to use one or small number of dedicated cores or kernel threads to poll the interrupts from all reply queues, then I guess you may benefit much compared with the extra 16 queue approach. Introducing extra 16 queues just for interrupt coalescing and making it coexisting with the regular 72 reply queues seems one very unusual use case, not sure the current genirq affinity can support it well. > > > > > > > > All pre_vectors (16) will be mapped to all available online CPUs but e > > > ffective affinity of each vector is to CPU 0. Our requirement is to > > > have pre _vectors 16 reply queues to be mapped to local NUMA node with > > > effective CPU should be spread within local node cpu mask. Without > > > changing kernel code, we can > > > > If all CPUs in one NUMA node is offline, can this use case work as > expected? > > Seems we have to understand what the use case is and how it works. > > Yes, if all CPUs of the NUMA node is offlined, IRQ-CPU affinity will be > broken and irqbalancer takes care of migrating affected IRQs to online > CPUs of different NUMA node. > When offline CPUs are onlined again, irqbalancer restores affinity. irqbalance daemon can't cover managed interrupts, or you mean you don't use pci_alloc_irq_vectors_affinity(PCI_IRQ_AFFINITY)? Thanks, Ming Lei