Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp2554838pxb; Thu, 11 Feb 2021 16:00:46 -0800 (PST) X-Google-Smtp-Source: ABdhPJxM3BW2twH486xwHidTU8hnIzSN8ZfaRerkUOL0YlMOp32uAlRTT16TyvCyVXDE24C1gD4/ X-Received: by 2002:a05:6402:94f:: with SMTP id h15mr631105edz.199.1613088045991; Thu, 11 Feb 2021 16:00:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613088045; cv=none; d=google.com; s=arc-20160816; b=GPjjSC2xPmufUI0Zr/y5yfHEWlE1ZImDsoZthgA2c7pavjKvG9OMbQ3q7p2CZ9g1w8 MAKp5nuc3+Lo2saXjXSZbOcMDBUnFlPubPOBRX7GD1/7szn/BJAYTOQG7vaZ6T9wDr3W XCzA04qqOWcupoo7/KsyUpN/0Nz0hNj5QcPY/ttfGhF66BfWeHFZbodmV2Z8GwoJvArk KqZ4wD972cTRhNSU5kX/LKtCncUDFMdr8HjmLO+vgMGJcWn/XcdQ2gSnjSIap/Okpcq3 alwPlZqaKtB0pQrOyl60hZ4QNd0egxmoSkUsqO+AQGZCrixIigMUqr3Jglgg1fXZFnLh ve5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-id:mime-version:references:message-id :in-reply-to:subject:cc:to:from:date; bh=T7LA3tKHGiy4FEuwO6wybTmx3gHNyykQEUmJ6qBbzi0=; b=Ey4WF7ZfJqQ/pToLKLV3bQRvyZnOQV1+ihRLdD/Qy+hq46aBM54EKgqZqpHqn/5r00 IbP6inoF0FhnpFpiFsR8hqnd8jAODrmLqr5Wdbq76pA8uGytE4FRcrnPxA3FQbw5sUyn ccIBj9UisNwsQheS6gjoydapOuyQihn1LHCNgh4uwNv3v8keHbnbKxeOa2fn9zWf6JKk pezb8b1l+RNapL9LaZVY2pViU5ERcs5/6C4fQwE7X0DoslLTrKnN1bU1ojaSqj+vxgGO m11UvFOHC9gvqP/YeNjPlSohoFyXgG34COro9jkZK5LgsOrjs2j4TeAokXKNk0xBjfZs ft4g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g23si4729627edh.11.2021.02.11.16.00.22; Thu, 11 Feb 2021 16:00:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230050AbhBKX6t (ORCPT + 99 others); Thu, 11 Feb 2021 18:58:49 -0500 Received: from kvm5.telegraphics.com.au ([98.124.60.144]:52920 "EHLO kvm5.telegraphics.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229647AbhBKX6m (ORCPT ); Thu, 11 Feb 2021 18:58:42 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by kvm5.telegraphics.com.au (Postfix) with ESMTP id A43F92ADA4; Thu, 11 Feb 2021 18:57:57 -0500 (EST) Date: Fri, 12 Feb 2021 10:58:04 +1100 (AEDT) From: Finn Thain To: "Song Bao Hua (Barry Song)" cc: tanxiaofei , "jejb@linux.ibm.com" , "martin.petersen@oracle.com" , "linux-scsi@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linuxarm@openeuler.org" , "linux-m68k@vger.kernel.org" Subject: RE: [Linuxarm] Re: [PATCH for-next 00/32] spin lock usage optimization for SCSI drivers In-Reply-To: <4646dd5cb26d4e1195951228c46fbff6@hisilicon.com> Message-ID: <8296cd-85d0-6b2e-3291-9a73db553ad8@telegraphics.com.au> References: <1612697823-8073-1-git-send-email-tanxiaofei@huawei.com> <31cd807d-3d0-ed64-60d-fde32cb3833c@telegraphics.com.au> <88d26bd86c314e5483ec596952054be7@hisilicon.com> <00c06b19e87a425fa3a4b6aaecc66d49@hisilicon.com> <9611728-3e7-3954-cfee-f3d3cf45df6@telegraphics.com.au> <13c414b9bd7940caa5e1df810356dcfd@hisilicon.com> <221cb29-53a8-fd1-4232-360655f28f3@telegraphics.com.au> <3ec7cb32aa754a59b894d048873132cf@hisilicon.com> <9d248ea6-f861-850-ba71-ac2cdd5596ff@telegraphics.com.au> <4646dd5cb26d4e1195951228c46fbff6@hisilicon.com> MIME-Version: 1.0 Content-Type: multipart/mixed; BOUNDARY="-1463811774-827468299-1613084310=:6" Content-ID: <78e7903f-6cc5-33d7-349a-13c12c8ea@telegraphics.com.au> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---1463811774-827468299-1613084310=:6 Content-Type: text/plain; CHARSET=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Content-ID: <989b9068-7be-9a98-58d-73ad484bb9c0@telegraphics.com.au> On Thu, 11 Feb 2021, Song Bao Hua (Barry Song) wrote: > > On Wed, 10 Feb 2021, Song Bao Hua (Barry Song) wrote: > >=20 > > > > On Wed, 10 Feb 2021, Song Bao Hua (Barry Song) wrote: > > > > > > > > > TBH, that is why m68k is so confusing. irqs_disabled() on m68k=20 > > > > > should just reflect the status of all interrupts have been=20 > > > > > disabled except NMI. > > > > > > > > > > irqs_disabled() should be consistent with the calling of APIs=20 > > > > > such as local_irq_disable, local_irq_save, spin_lock_irqsave=20 > > > > > etc. > > > > > > > > > > > > > When irqs_disabled() returns true, we cannot infer that=20 > > > > arch_local_irq_disable() was called. But I have not yet found=20 > > > > driver code or core kernel code attempting that inference. > > > > > > > > > > > > > > > > > Isn't arch_irqs_disabled() a status reflection of irq=20 > > > > > > > disable API? > > > > > > > > > > > > > > > > > > > Why not? > > > > > > > > > > If so, arch_irqs_disabled() should mean all interrupts have been= =20 > > > > > masked except NMI as NMI is unmaskable. > > > > > > > > > > > > > Can you support that claim with a reference to core kernel code or= =20 > > > > documentation? (If some arch code agrees with you, that's neither= =20 > > > > here nor there.) > > > > > > I think those links I share you have supported this. Just you don't= =20 > > > believe :-) > > > > >=20 > > Your links show that the distinction between fast and slow handlers=20 > > was removed. Your links don't support your claim that=20 > > "arch_irqs_disabled() should mean all interrupts have been masked".=20 > > Where is the code that makes that inference? Where is the=20 > > documentation that supports your claim? >=20 > (1) > https://lwn.net/Articles/380931/ > Looking at all these worries, one might well wonder if a system which=20 > *disabled interrupts for all handlers* would function well at all. So it= =20 > is interesting to note one thing: any system which has the lockdep=20 > locking checker enabled has been running all handlers that way for some= =20 > years now. Many developers and testers run lockdep-enabled kernels, and= =20 > they are available for some of the more adventurous distributions=20 > (Rawhide, for example) as well. So we have quite a bit of test coverage= =20 > for this mode of operation already. >=20 IIUC, your claim is that CONFIG_LOCKDEP involves code that contains the=20 inference, "arch_irqs_disabled() means all interrupts have been masked". Unfortunately, m68k lacks CONFIG_LOCKDEP support so I can't easily confirm= =20 this. I suppose there may be other architectures that support both LOCKDEP= =20 and nested interrupts (?) Anyway, if you would point to the code that contains said inference, that= =20 would help a lot. > (2) > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit= /?id=3Db738a50a >=20 > "We run all handlers *with interrupts disabled* and expect them not to > enable them. Warn when we catch one who does." >=20 Again, you don't see that warning because irqs_disabled() correctly=20 returns true. You can confirm this fact in QEMU or Aranym if you are=20 interested. > (3)=20 > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit= /?id=3De58aa3d2d0cc > genirq: Run irq handlers *with interrupts disabled* >=20 > Running interrupt handlers with interrupts enabled can cause stack=20 > overflows. That has been observed with multiqueue NICs delivering all=20 > their interrupts to a single core. We might band aid that somehow by=20 > checking the interrupt stacks, but the real safe fix is to *run the irq= =20 > handlers with interrupts disabled*. >=20 Again, the stack overflow issue is not applicable. 68000 uses a priority=20 mask, like ARM GIC. So there's no arbitrary nesting of interrupt handlers.= =20 In practice stack overflows simply don't occur on m68k. Please do try it. >=20 > All these documents say we are running irq handler with interrupts > disabled. but it seems you think high-prio interrupts don't belong > to "interrupts" in those documents :-) >=20 > that is why we can't get agreement. I think "interrupts" mean all except= =20 > NMI in these documents, but you insist high-prio IRQ is an exception. >=20 We can't get agreement because you seek to remove functionality without=20 justification. > > > > > > > > > > > > > > > > Are all interrupts (including NMI) masked whenever=20 > > > > > > arch_irqs_disabled() returns true on your platforms? > > > > > > > > > > On my platform, once irqs_disabled() is true, all interrupts are= =20 > > > > > masked except NMI. NMI just ignore spin_lock_irqsave or=20 > > > > > local_irq_disable. > > > > > > > > > > On ARM64, we also have high-priority interrupts, but they are > > > > > running as PESUDO_NMI: > > > > > https://lwn.net/Articles/755906/ > > > > > > > > > > > > > A glance at the ARM GIC specification suggests that your hardware= =20 > > > > works much like 68000 hardware. > > > > > > > > When enabled, a CPU interface takes the highest priority=20 > > > > pending interrupt for its connected processor and determines=20 > > > > whether the interrupt has sufficient priority for it to signal= =20 > > > > the interrupt request to the processor. [...] > > > > > > > > When the processor acknowledges the interrupt at the CPU=20 > > > > interface, the Distributor changes the status of the interrupt= =20 > > > > from pending to either active, or active and pending. At this=20 > > > > point the CPU interface can signal another interrupt to the=20 > > > > processor, to preempt interrupts that are active on the=20 > > > > processor. If there is no pending interrupt with sufficient=20 > > > > priority for signaling to the processor, the interface=20 > > > > deasserts the interrupt request signal to the processor. > > > > > > > > https://developer.arm.com/documentation/ihi0048/b/ > > > > > > > > Have you considered that Linux/arm might benefit if it could fully= =20 > > > > exploit hardware features already available, such as the interrupt= =20 > > > > priority masking feature in the GIC in existing arm systems? > > > > > > I guess no:-) there are only two levels: IRQ and NMI. Injecting a=20 > > > high-prio IRQ level between them makes no sense. > > > > > > To me, arm64's design is quite clear and has no any confusion. > > > > >=20 > > Are you saying that the ARM64 hardware design is confusing because it= =20 > > implements a priority mask, and that's why you had to simplify it with= =20 > > a pseudo-nmi scheme in software? >=20 > No, I was not saying this. I think both m68k and arm64 have good=20 > hardware design. Just Linux's implementation is running irq-handlers=20 > with interrupts disabled. So ARM64's pseudo-nmi is adapted to Linux=20 > better. >=20 So, a platform should do what all the other platforms do because to=20 deviate would be too dangerous? > > > > > > > > > On m68k, it seems you mean=EF=BC=9A > > > > > irq_disabled() is true, but high-priority interrupts can still=20 > > > > > come; local_irq_disable() can disable high-priority interrupts,= =20 > > > > > and at that time, irq_disabled() is also true. > > > > > > > > > > TBH, this is wrong and confusing on m68k. > > > > > > > > > > > > > Like you, I was surprised when I learned about it. But that=20 > > > > doesn't mean it's wrong. The fact that it works should tell you=20 > > > > something. > > > > > > > > > > The fact is that m68k lets arch_irq_disabled() return true to=20 > > > pretend all IRQs are disabled while high-priority IRQ is still open,= =20 > > > thus "pass" all sanitizing check in genirq and kernel core. > > > > >=20 > > The fact is that m68k has arch_irq_disabled() return false when all=20 > > IRQs are enabled. So there is no bug. >=20 > But it has arch_irq_disabled() return true while some interrupts(not=20 > NMI) are still open. >=20 > >=20 > > > > Things could always be made simpler. But discarding features isn't= =20 > > > > necessarily an improvement. > > > > > > This feature could be used by calling local_irq_enable_in_hardirq()= =20 > > > in those IRQ handlers who hope high-priority interrupts to preempt=20 > > > it for a while. > > > > >=20 > > So, if one handler is sensitive to interrupt latency, all other=20 > > handlers should be modified? I don't think that's workable. >=20 > I think we just enable preempt_rt or force threaded_irq, and then=20 > improve the priority of the irq thread who is sensitive to latency. No=20 > need to touch all threads. >=20 > I also understand your point, we let one high-prio interrupt preempt low= =20 > priority interrupt, then we don't need to change the whole system. But I= =20 > think Linux prefers the method of threaded_irq or preempt_rt for this=20 > kind of problems. >=20 So, some interrupt (or exception) processing happens atomically and the=20 rest is deferred to a different execution context. (Not a new idea.) If you introduce latency in the former context you can't win it back in=20 the latter. Your solution fails because it adds latency to high priority=20 handlers. > >=20 > > In anycase, what you're describing is a completely different nested=20 > > interrupt scheme that would defeat the priority level mechanism that=20 > > the hardware provides us with. >=20 > Yes. Indeed. >=20 > >=20 > > > It shouldn't hide somewhere and make confusion. > > > > >=20 > > The problem is hiding so well that no-one has found it! I say it=20 > > doesn't exist. >=20 > Long long ago(before 2.6.38), we had a kernel supporting IRQF_DISABLED=20 > and nested interrupts were widely supported, but system also ran well in= =20 > most cases. That means nested interrupts don't really matter in most=20 > cases. That is why m68k is also running well even though it is still=20 > nesting. >=20 No, m68k runs well because it uses priority masking. It is not because=20 some cases are untested. Your hardware may not have been around for 4 decades but it implements the= =20 same capability because the design is known to work. > >=20 > > > On the other hand, those who care about realtime should use threaded= =20 > > > IRQ and let IRQ threads preempt each other. > > > > >=20 > > Yes. And those threads also have priority levels. >=20 > Finn, I am not a m68k guy, would you help check if this could activate a > warning on m68k. maybe we can discuss this question in genirq maillist fr= om > this warning if you are happy. Thanks very much. >=20 > diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h > index 7c9d6a2d7e90..b8ca27555c76 100644 > --- a/include/linux/hardirq.h > +++ b/include/linux/hardirq.h > @@ -32,6 +32,7 @@ static __always_inline void rcu_irq_enter_check_tick(vo= id) > */ > #define __irq_enter() \ > do { \ > + WARN_ONCE(in_hardirq() && irqs_disabled(), "nested interr= upts\n"); \ > preempt_count_add(HARDIRQ_OFFSET); \ > lockdep_hardirq_enter(); \ > account_hardirq_enter(current); \ > @@ -44,6 +45,7 @@ static __always_inline void rcu_irq_enter_check_tick(vo= id) > */ > #define __irq_enter_raw() \ > do { \ > + WARN_ONCE(in_hardirq() && irqs_disabled(), "nested interr= upts\n"); \ > preempt_count_add(HARDIRQ_OFFSET); \ > lockdep_hardirq_enter(); \ > } while (0) >=20 If you think that lockdep or some other code somewhere should be protected= =20 in this way, perhaps you can point to that code. Otherwise, your patch=20 seems to lack any justification. > Best Regards > Barry >=20 >=20 ---1463811774-827468299-1613084310=:6--