Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752144AbcJMHbW (ORCPT ); Thu, 13 Oct 2016 03:31:22 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:35359 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751488AbcJMHbS (ORCPT ); Thu, 13 Oct 2016 03:31:18 -0400 Date: Thu, 13 Oct 2016 00:30:36 -0700 From: "Paul E. McKenney" To: Rich Felker Cc: Thomas Gleixner , linux-kernel@vger.kernel.org, linux-sh@vger.kernel.org, Jason Cooper , Marc Zyngier , Daniel Lezcano Subject: Re: [PATCH] irqchip/jcore: fix lost per-cpu interrupts Reply-To: paulmck@linux.vnet.ibm.com References: <41fc74d0bdea4c0efc269150b78d72b2b26cb38c.1475992312.git.dalias@libc.org> <20161009144715.GB19318@brightrain.aerifal.cx> <20161011152140.GH19318@brightrain.aerifal.cx> <20161012163543.GN19318@brightrain.aerifal.cx> <20161012203417.GA8847@linux.vnet.ibm.com> <20161012221927.GR19318@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161012221927.GR19318@brightrain.aerifal.cx> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16101307-0020-0000-0000-00000A027D0A X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00005902; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000186; SDB=6.00767726; UDB=6.00367383; IPR=6.00543888; BA=6.00004805; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00012973; XFM=3.00000011; UTC=2016-10-13 07:30:39 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16101307-0021-0000-0000-0000566805BA Message-Id: <20161013073036.GO29518@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-10-13_04:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1610130125 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4118 Lines: 71 On Wed, Oct 12, 2016 at 06:19:27PM -0400, Rich Felker wrote: > On Wed, Oct 12, 2016 at 01:34:17PM -0700, Paul E. McKenney wrote: > > On Wed, Oct 12, 2016 at 12:35:43PM -0400, Rich Felker wrote: > > > On Wed, Oct 12, 2016 at 10:18:02AM +0200, Thomas Gleixner wrote: > > > > On Tue, 11 Oct 2016, Rich Felker wrote: > > > > > On Sun, Oct 09, 2016 at 09:23:58PM +0200, Thomas Gleixner wrote: > > > > > > On Sun, 9 Oct 2016, Rich Felker wrote: > > > > > > > On Sun, Oct 09, 2016 at 01:03:10PM +0200, Thomas Gleixner wrote: > > > > > > > My preference would just be to keep the branch, but with your improved > > > > > > > version that doesn't need a function call: > > > > > > > > > > > > > > irqd_is_per_cpu(irq_desc_get_irq_data(desc)) > > > > > > > > > > > > > > While there is some overhead testing this condition every time, I can > > > > > > > probably come up with several better places to look for a ~10 cycle > > > > > > > improvement in the irq code path without imposing new requirements on > > > > > > > the DT bindings. > > > > > > > > > > > > Fair enough. Your call. > > > > > > > > > > > > > As noted in my followup to the clocksource stall thread, there's also > > > > > > > a possibility that it might make sense to consider the current > > > > > > > behavior of having non-percpu irqs bound to a particular cpu as part > > > > > > > of what's required by the compatible tag, in which case > > > > > > > handle_percpu_irq or something similar/equivalent might be suitable > > > > > > > for both the percpu and non-percpu cases. I don't understand the irq > > > > > > > subsystem well enough to insist on that but I think it's worth > > > > > > > consideration since it looks like it would improve performance of > > > > > > > non-percpu interrupts a bit. > > > > > > > > > > > > Well, you can use handle_percpu_irq() for your device interrupts if you > > > > > > guarantee at the hardware level that there is no reentrancy. Once you make > > > > > > the hardware capable of delivering them on either core the picture changes. > > > > > > > > > > One more concern here -- I see that handle_simple_irq is handling the > > > > > soft-disable / IRQS_PENDING flag behavior, and irq_check_poll stuff > > > > > that's perhaps important too. Since soft-disable is all we have > > > > > (there's no hard-disable of interrupts), is this a problem? In other > > > > > words, can drivers have an expectation of not receiving interrupts > > > > > when the irq is disabled? I would think anything compatible with irq > > > > > sharing can't have such an expectation, but perhaps the kernel needs > > > > > disabling internally for synchronization at module-unload time or > > > > > similar cases? > > > > > > > > Sure. A driver would be surprised getting an interrupt when it is disabled, > > > > but with your exceptionally well thought out interrupt controller a pending > > > > (level) interrupt which is not handled will be reraised forever and just > > > > hard lock the machine. > > > > > > If you want to criticize the interrupt controller design (not my work > > > or under my control) for limitations in the type of hardware that can > > > be hooked up to it, that's okay -- this kind of input will actually be > > > useful for designing the next iteration of it -- but I don't think > > > this specific possibility is a concern. > > > > Well, if this scenario does happen, the machine will likely either lock > > up silently and hard, give you RCU CPU stall warning messages, or give > > you soft-lockup messages. > > The same situation can happen with badly-behaved hardware under > software interrupt control too if it keeps generating interrupts > rapidly (more quickly than the cpu can handle them), unless the kernel > has some kind of framework for disabling the interrupt and only > reenabling it later via a timer. It's equivalent to a realtime-prio > process failing to block/sleep to give lower-priority processes a > chance to run. Indeed, there are quite a few scenarios that can lead to silent hard lockups, RCU CPU stall warning messages, or soft-lockup messages. Thanx, Paul