Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp5396619pxu; Thu, 22 Oct 2020 00:46:18 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzDrcsOU1EU18z02IvGglTz24Y7+wtspKbivNEjojVaqkE24+JzKH1lLZE2SNbNQDzYOk3V X-Received: by 2002:aa7:c2ca:: with SMTP id m10mr1006553edp.255.1603352777853; Thu, 22 Oct 2020 00:46:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603352777; cv=none; d=google.com; s=arc-20160816; b=KTvYrEOyVouwWTLROeguxxOSqtYS6hDI3+Q6OF6EfaY1CLMjoDidAKbr2dHl8c2VQj 404i3XVXWtXXTVJGkSwUM3C7w/t9blO2LT/oKbPkzR82yPkYQH1+QpzaGef0Hd9XmIcI hn5cpxFjRPsWhczHgceFWWZ63/Jv3+xBzy+wiwIGKNbag8paWVujUUnMvARqQp0n/GTu qJHUw69gAj2kyWX2IqJj+R4rW0yiwSiBHncIYPytmkmafxQP3GjqgyiEWfvWRxp/tE9A ybZfwPN0Dhjf1iGQk7OA3ccXwaSKPR/oN31OOmP5JXc1f6Zi1x1w6Py8ezgjMjHoYBfJ /wnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=oYVLNVxe7r0Qm8sC1pIw+FfDIJDhV9RmkhcAPCB+S88=; b=wLgaMwsdnYjjxvAAg1FGcnWaULwXEojHh1j/KARwE39skoXUZltFEozoHSgrttVuEN 2T7khRwbDpc1YAcJox8ZaWjr2vVo04Dk08iSRH3kw1B3Saw1M3LOT22B2IGP3LgjpkSF UkK8igc4PYwf5RMJwDeFFsa/BYM70ueEhODoOFPQhgmv9qB2tQdP2pZFwPGHyT24ulPK mCsaQ/0p/qsXNA1LDScdNJeEQwCngCXkxTgQzopmKum/H2J0ExIuwh308N+xBA+oR2Io VOI4uGRMeUQj24VV40DuzklGrZAZ8BucLNbs9wRPgRUjaDPQS67Rj+pWRNgWJlqdMucj hbZg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Xi7ym5ut; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v5si456937edx.121.2020.10.22.00.45.54; Thu, 22 Oct 2020 00:46:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Xi7ym5ut; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2507387AbgJVAC2 (ORCPT + 99 others); Wed, 21 Oct 2020 20:02:28 -0400 Received: from mail.kernel.org ([198.145.29.99]:34470 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2507379AbgJVAC2 (ORCPT ); Wed, 21 Oct 2020 20:02:28 -0400 Received: from kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com (unknown [163.114.132.4]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id EA8C42068E; Thu, 22 Oct 2020 00:02:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1603324947; bh=3UPSINPAND/lqccE62eGQN/0o8y3xZbQAeiWy+GF/To=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=Xi7ym5utmWSTIoAwg6r9dGp4x6QleovpXhk6RRUPw8uIK2epUTOTLeZqLsOBoTZ19 ELd7dNL9gY8j3J05Mxzer4ue+0NTsCSAEGuGanmBsKvABq5hoi8LfnAGOujo9faeNV noUD3Q1gU/QxClDHx7T+3JvxvIM24UHqI5n5p3vc= Date: Wed, 21 Oct 2020 17:02:24 -0700 From: Jakub Kicinski To: Thomas Gleixner Cc: Nitesh Narayan Lal , linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-pci@vger.kernel.org, intel-wired-lan@lists.osuosl.org, frederic@kernel.org, mtosatti@redhat.com, sassmann@redhat.com, jesse.brandeburg@intel.com, lihong.yang@intel.com, helgaas@kernel.org, jeffrey.t.kirsher@intel.com, jacob.e.keller@intel.com, jlelli@redhat.com, hch@infradead.org, bhelgaas@google.com, mike.marciniszyn@intel.com, dennis.dalessandro@intel.com, thomas.lendacky@amd.com, jiri@nvidia.com, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, lgoncalv@redhat.com, Dave Miller , Magnus Karlsson , Saeed Mahameed Subject: Re: [PATCH v4 4/4] PCI: Limit pci_alloc_irq_vectors() to housekeeping CPUs Message-ID: <20201021170224.55aea948@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> In-Reply-To: <877drj72cz.fsf@nanos.tec.linutronix.de> References: <20200928183529.471328-1-nitesh@redhat.com> <20200928183529.471328-5-nitesh@redhat.com> <87v9f57zjf.fsf@nanos.tec.linutronix.de> <3bca9eb1-a318-1fc6-9eee-aacc0293a193@redhat.com> <87lfg093fo.fsf@nanos.tec.linutronix.de> <877drj72cz.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 21 Oct 2020 22:25:48 +0200 Thomas Gleixner wrote: > On Tue, Oct 20 2020 at 20:07, Thomas Gleixner wrote: > > On Tue, Oct 20 2020 at 12:18, Nitesh Narayan Lal wrote: > >> However, IMHO we would still need a logic to prevent the devices from > >> creating excess vectors. > > > > Managed interrupts are preventing exactly that by pinning the interrupts > > and queues to one or a set of CPUs, which prevents vector exhaustion on > > CPU hotplug. > > > > Non-managed, yes that is and always was a problem. One of the reasons > > why managed interrupts exist. > > But why is this only a problem for isolation? The very same problem > exists vs. CPU hotplug and therefore hibernation. > > On x86 we have at max. 204 vectors available for device interrupts per > CPU. So assumed the only device interrupt in use is networking then any > machine which has more than 204 network interrupts (queues, aux ...) > active will prevent the machine from hibernation. > > Aside of that it's silly to have multiple queues targeted at a single > CPU in case of hotplug. And that's not a theoretical problem. Some > power management schemes shut down sockets when the utilization of a > system is low enough, e.g. outside of working hours. > > The whole point of multi-queue is to have locality so that traffic from > a CPU goes through the CPU local queue. What's the point of having two > or more queues on a CPU in case of hotplug? > > The right answer to this is to utilize managed interrupts and have > according logic in your network driver to handle CPU hotplug. When a CPU > goes down, then the queue which is associated to that CPU is quiesced > and the interrupt core shuts down the relevant interrupt instead of > moving it to an online CPU (which causes the whole vector exhaustion > problem on x86). When the CPU comes online again, then the interrupt is > reenabled in the core and the driver reactivates the queue. I think Mellanox folks made some forays into managed irqs, but I don't remember/can't find the details now. For networking the locality / queue per core does not always work, since the incoming traffic is usually spread based on a hash. Many applications perform better when network processing is done on a small subset of CPUs, and application doesn't get interrupted every 100us. So we do need extra user control here. We have a bit of a uAPI problem since people had grown to depend on IRQ == queue == NAPI to configure their systems. "The right way" out would be a proper API which allows associating queues with CPUs rather than IRQs, then we can use managed IRQs and solve many other problems. Such new API has been in the works / discussions for a while now. (Magnus keep me honest here, if you disagree the queue API solves this.)