Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp2643426ybz; Mon, 27 Apr 2020 01:29:45 -0700 (PDT) X-Google-Smtp-Source: APiQypKostuVFIocWoJKIt1bJrad4BTl3+VWRknLNrZk+9T4vihhWZASZULVt++vUWVZ9Av/Hvkj X-Received: by 2002:a17:906:3607:: with SMTP id q7mr9253178ejb.313.1587976185384; Mon, 27 Apr 2020 01:29:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1587976185; cv=none; d=google.com; s=arc-20160816; b=OM3qJUhttPi0WgEGCitzTPmiETN+kMubs4VQKbFjTYNJJtIHvPlsm4VRLsuDrtuiUL lh5krZZkrt0jbcZxqqmt4R/rFz2B9GjbxclPZykm4o5MKDYko76fDeVxYrteouma7Twt /S60mBJAvNFX5323/Z+JtMgsSeDTmpoMkNXWTUMy1KxoeoU7ztlM3VJJ+obXSg7DS9E3 KQiaEyP5OChdgZ37PuFP4ODOjZUqGQIW60Pv9HBrpCYwA8akTA8ig1pkJ+yv6j5o4gQV OgAxkfE7yL4MhifG5P4+7F18NVL2kT7gsTMp6C+PtSswXTu0T7lwsCs7/qTIRVmKfu14 /q4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:message-id:references :in-reply-to:subject:cc:to:from:date:content-transfer-encoding :mime-version:dkim-signature; bh=mC8yLu/KAVSTRgo4O8zb9/xH/WELRUzZ3StXsb1jriM=; b=ucngbQU+vipVo6VNc/9VLb1QqXY4/2L4bNz5DhzSWGbyuVbfjz+G1jbtrzuKJ1XPt9 0dVnnq157zGGuLnhmFMB4I426pE2Fbh/MUOO8eoEEgJ0Qqc3kToMcOtNJDgM8i545XAe zaCX1oNjb5Bx9Oi463xPkKnQioXmL2OAVQ/szpDcaibwVSDYVqnKFlOVip6pXj/CS6hV CcwNYsVOU8VdOnW7r1yOSiffQY08VYBrKJO/XFgV13n2454a6IFMPvEWHGPWPLe+p7Ni ChbyHXJWlkuvpQKULseXJIkLxsdMGUk6Qe25l/27DCJa2aUq9x+9qvAld0qgllWcSCqZ Zq4Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Ut25Fgrq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j19si7906495ejx.382.2020.04.27.01.29.22; Mon, 27 Apr 2020 01:29:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Ut25Fgrq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726760AbgD0I11 (ORCPT + 99 others); Mon, 27 Apr 2020 04:27:27 -0400 Received: from mail.kernel.org ([198.145.29.99]:58488 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726003AbgD0I11 (ORCPT ); Mon, 27 Apr 2020 04:27:27 -0400 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 2913920661; Mon, 27 Apr 2020 08:27:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1587976046; bh=WlS5jPTB2ToO57TX6wf76UBIN8mQyKkRwqeX57hjmeM=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=Ut25FgrqyPtbjntQzmKBAd39JvOY/jqNVT5t86r9pJq6xhvLAyR+UcKRz4+lKcq1Q +ufIvuCsLXqK4vpmfbC4RrIAazS7STHkjD4hjfKhBDqR2M/gtsX3lJD2uZqvLBuYL+ 0sJHSAsMEq0UB2DgsKS/7wSpPinyxpbBouQNrJxM= Received: from disco-boy.misterjones.org ([51.254.78.96] helo=www.loen.fr) by disco-boy.misterjones.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1jSz6q-006fSJ-Eo; Mon, 27 Apr 2020 09:27:24 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Mon, 27 Apr 2020 09:27:24 +0100 From: Marc Zyngier To: Anup Patel Cc: Zong Li , Palmer Dabbelt , Paul Walmsley , "linux-kernel@vger.kernel.org List" , linux-riscv , David Abdurachmanov Subject: Re: [PATCH] irqchip/sifive-plic: allow many cores to handle IRQs In-Reply-To: References: <20200426110740.123638-1-zong.li@sifive.com> Message-ID: <60083bbdfe6a6299efc8d07d8a02d647@kernel.org> X-Sender: maz@kernel.org User-Agent: Roundcube Webmail/1.3.10 X-SA-Exim-Connect-IP: 51.254.78.96 X-SA-Exim-Rcpt-To: anup@brainfault.org, zong.li@sifive.com, palmer@dabbelt.com, paul.walmsley@sifive.com, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, david.abdurachmanov@sifive.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Anup, On 2020-04-26 14:38, Anup Patel wrote: > +Mark Z s/k/c/ ;-) > > On Sun, Apr 26, 2020 at 6:49 PM Zong Li wrote: >> >> On Sun, Apr 26, 2020 at 8:47 PM Anup Patel >> wrote: >> > >> > On Sun, Apr 26, 2020 at 4:37 PM Zong Li wrote: >> > > >> > > Currently, driver forces the IRQs to be handled by only one core. This >> > > patch provides the way to enable others cores to handle IRQs if needed, >> > > so users could decide how many cores they wanted on default by boot >> > > argument. >> > > >> > > Use 'irqaffinity' boot argument to determine affinity. If there is no >> > > irqaffinity in dts or kernel configuration, use irq default affinity, >> > > so all harts would try to claim IRQ. >> > > >> > > For example, add irqaffinity=0 in chosen node to set irq affinity to >> > > hart 0. It also supports more than one harts to handle irq, such as set >> > > irqaffinity=0,3,4. >> > > >> > > You can change IRQ affinity from user-space using procfs. For example, >> > > you can make CPU0 and CPU2 serve IRQ together by the following command: >> > > >> > > echo 4 > /proc/irq//smp_affinity >> > > >> > > Signed-off-by: Zong Li >> > > --- >> > > drivers/irqchip/irq-sifive-plic.c | 21 +++++++-------------- >> > > 1 file changed, 7 insertions(+), 14 deletions(-) >> > > >> > > diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c >> > > index d0a71febdadc..bc1440d54185 100644 >> > > --- a/drivers/irqchip/irq-sifive-plic.c >> > > +++ b/drivers/irqchip/irq-sifive-plic.c >> > > @@ -111,15 +111,12 @@ static inline void plic_irq_toggle(const struct cpumask *mask, >> > > static void plic_irq_unmask(struct irq_data *d) >> > > { >> > > struct cpumask amask; >> > > - unsigned int cpu; >> > > struct plic_priv *priv = irq_get_chip_data(d->irq); >> > > >> > > cpumask_and(&amask, &priv->lmask, cpu_online_mask); >> > > - cpu = cpumask_any_and(irq_data_get_affinity_mask(d), >> > > - &amask); >> > > - if (WARN_ON_ONCE(cpu >= nr_cpu_ids)) >> > > - return; >> > > - plic_irq_toggle(cpumask_of(cpu), d, 1); >> > > + cpumask_and(&amask, &amask, irq_data_get_affinity_mask(d)); >> > > + >> > > + plic_irq_toggle(&amask, d, 1); >> > > } >> > > >> > > static void plic_irq_mask(struct irq_data *d) >> > > @@ -133,24 +130,20 @@ static void plic_irq_mask(struct irq_data *d) >> > > static int plic_set_affinity(struct irq_data *d, >> > > const struct cpumask *mask_val, bool force) >> > > { >> > > - unsigned int cpu; >> > > struct cpumask amask; >> > > struct plic_priv *priv = irq_get_chip_data(d->irq); >> > > >> > > cpumask_and(&amask, &priv->lmask, mask_val); >> > > >> > > if (force) >> > > - cpu = cpumask_first(&amask); >> > > + cpumask_copy(&amask, mask_val); >> > > else >> > > - cpu = cpumask_any_and(&amask, cpu_online_mask); >> > > - >> > > - if (cpu >= nr_cpu_ids) >> > > - return -EINVAL; >> > > + cpumask_and(&amask, &amask, cpu_online_mask); >> > > >> > > plic_irq_toggle(&priv->lmask, d, 0); >> > > - plic_irq_toggle(cpumask_of(cpu), d, 1); >> > > + plic_irq_toggle(&amask, d, 1); >> > > >> > > - irq_data_update_effective_affinity(d, cpumask_of(cpu)); >> > > + irq_data_update_effective_affinity(d, &amask); >> > > >> > > return IRQ_SET_MASK_OK_DONE; >> > > } >> > > -- >> > > 2.26.1 >> > > >> > >> > I strongly oppose (NACK) this patch due to performance reasons. >> > >> > In PLIC, if we enable an IRQ X for N CPUs then when IRQ X occurs: >> > 1) All N CPUs will take interrupt >> > 2) All N CPUs will try to read PLIC CLAIM register >> > 3) Only one of the CPUs will see IRQ X using the CLAIM register >> > but other N - 1 CPUs will see no interrupt and return back to what >> > they were doing. In other words, N - 1 CPUs will just waste CPU >> > every time IRQ X occurs. >> > >> > Example1, one Application doing heavy network traffic will >> > degrade performance of other applications because with every >> > network RX/TX interrupt N-1 CPUs will waste CPU trying to >> > process network interrupt. >> > >> > Example1, one Application doing heavy MMC/SD traffic will >> > degrade performance of other applications because with every >> > SPI read/write interrupt N-1 CPUs will waste CPU trying to >> > process it. >> > >> > In fact, the current PLIC approach is actually a performance >> > optimization. This implementation also works fine with in-kernel >> > load-balancer and user space load balancer. >> > >> >> Yes, it's exactly, I know what you pointed out. But the idea of this >> patch is just providing a way that users could enable other cores if >> they wanted, it could still enable only one core by this change. The >> purpose here is thinking of flexible use, rather than limitation. >> Maybe it would be a happy medium that we make the default case enable >> only one core? It is a good open discussion. > > Making the default case as enable only one core is just a work-around. > > As-per my understanding, if we set affinity mask of N CPUs for IRQ X > then it does not mean all N CPUs should receive IRQ X rather it means > that exactly one of the N CPUs will receive IRQ X and the IRQ receiving > CPU will be fixed (reflected by effective affinity returned by the > driver). > > If we ignore above semantics and still provide a mechanism to target > IRQ X to N CPUs then most likely someone will try and run into > performance issues. > > Please don't go this path. The performance impact in case of Guest/VM > is even worst because PLIC is trap-n-emulated by hypervisors as MMIO > device. Just to add my view on this, as we get the same request on the ARM side once every other year: - the "broadcast" aspect of the interrupt signalling is usually a waste of CPU cycles on all CPUs but the one that has actually takes it. - the fact that you have to use MMIO to interact with the interrupt controller potentially makes this even worse if you don't have some per-CPU datapath that will handle this really fast. - not hitting the same CPU at all times for a given interrupt means that the caches are usually cold for this interrupt, meaning the gain in latency is pretty dubious. There is a very small (but also very vocal) fringe of users that insist that they cannot live without this broadcast aspect, because it allows them to shave a couple of cycles off a trivial interrupt handler on systems that are otherwise mostly idle. My answer to this is that Linux is a general purpose OS, and not a glorified interrupt handler. We aim to have balanced response times, and give *userspace* control over where the interrupt happens in the long run. HTH, M. -- Jazz is not dead. It just smells funny...