Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp1011404pxy; Thu, 22 Apr 2021 20:30:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwaxPNKuL5tPaeLFwmiJXRK0FezkWwg9BB3sjV+n0vwMay527KSVTKbTbVQxz+oMdalDG9O X-Received: by 2002:a17:90a:2acb:: with SMTP id i11mr2004653pjg.131.1619148655146; Thu, 22 Apr 2021 20:30:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619148655; cv=none; d=google.com; s=arc-20160816; b=1CS+dw0foQYuUpdZvXp98yWnYA2ur1nezrH6k9m6gtxcgZWFmGj70u3tfjHqDk3kEp 3vLeL2cT3zl0AjjBiF0IeEIrn/H5iDjISPYVy99CjAltJSXmiS7VMbXpZnl9Y6/Swgtt BvseWVlJ7p2ezJ/KzbcUxGA4jhxratjch5h67zxClmnI2iJ9V9Dg+LK6gLXQkK1UtErv FPj72xjM7mVgHmRDzY8a8EH0BKrRxt8R333T8nfU5Q7HSDi3PT2Wym92CgrVjOFdylCR SF01MHu7JwPvtcnsfK+3Q6DrV3LkR+82l05TSaYoKiVHSqHDmWH25yPq/dxZL8rFsas9 JP9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject; bh=QNkpj+4CDSfGEPMsG2pVvCx7WWLAOlJIqvGGG//tfnE=; b=Y8x30gwujLub2mtbhIjgmc9ZfWT78fx0agFZ0OxZEhqnNN2QB9b4eaiqgui7dFSoeK 2RMNgXz55WfS//Hruigs2fiOzpiFenfwoiyBOioA/hZM7OrycdPsEg/tCbqPHgKNRWXy PsPpnbearMNhkyD6HCiVC7GW2MTEfnXdmzZ5IhQFLzkYnScteh70KRvqicgS21fyaLH2 peNzp6vmqGK3tlhvxeW5hPGOUoGO2z1iPG+sqNjEcjJnXqSCJ6PU7o/3Co6+eswALyKm UuUTrlFggX+khDoUCkBEjGnVKK/18ZudmLIRxX/WHy/sky8YbOZaIaRA6YQknq989SI9 NN7g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q15si5530915pgu.420.2021.04.22.20.30.42; Thu, 22 Apr 2021 20:30:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240190AbhDWDa0 (ORCPT + 99 others); Thu, 22 Apr 2021 23:30:26 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:17029 "EHLO szxga05-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231552AbhDWDaZ (ORCPT ); Thu, 22 Apr 2021 23:30:25 -0400 Received: from DGGEMS405-HUB.china.huawei.com (unknown [172.30.72.58]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4FRKTK5RwJzPtVF; Fri, 23 Apr 2021 11:26:45 +0800 (CST) Received: from [10.67.110.136] (10.67.110.136) by DGGEMS405-HUB.china.huawei.com (10.3.19.205) with Microsoft SMTP Server id 14.3.498.0; Fri, 23 Apr 2021 11:29:44 +0800 Subject: Re: [RFC PATCH] irqchip/gic-v3: Do not enable irqs when handling spurious interrups To: Marc Zyngier CC: , , , , , Mark Rutland References: <20210416062217.25157-1-heying24@huawei.com> <87y2dis4d7.wl-maz@kernel.org> <875z0eijxh.wl-maz@kernel.org> From: He Ying Message-ID: <47abc8a6-0f73-d1c0-789f-e979d4191ab2@huawei.com> Date: Fri, 23 Apr 2021 11:29:44 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <875z0eijxh.wl-maz@kernel.org> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.110.136] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2021/4/22 20:27, Marc Zyngier 写道: > On Sat, 17 Apr 2021 03:01:54 +0100, > He Ying wrote: >> Hello Marc, >> >> >> 在 2021/4/16 22:15, Marc Zyngier 写道: >>> [+ Mark] >>> >>> On Fri, 16 Apr 2021 07:22:17 +0100, >>> He Ying wrote: >>>> We found this problem in our kernel src tree: >>>> >>>> [ 14.816231] ------------[ cut here ]------------ >>>> [ 14.816231] kernel BUG at irq.c:99! >>>> [ 14.816232] Internal error: Oops - BUG: 0 [#1] SMP >>>> [ 14.816232] Process swapper/0 (pid: 0, stack limit = 0x(____ptrval____)) >>>> [ 14.816233] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 4.19.95-1.h1.AOS2.0.aarch64 #14 >>>> [ 14.816233] Hardware name: evb (DT) >>>> [ 14.816234] pstate: 80400085 (Nzcv daIf +PAN -UAO) >>>> [ 14.816234] pc : asm_nmi_enter+0x94/0x98 >>>> [ 14.816235] lr : asm_nmi_enter+0x18/0x98 >>>> [ 14.816235] sp : ffff000008003c50 >>>> [ 14.816235] pmr_save: 00000070 >>>> [ 14.816237] x29: ffff000008003c50 x28: ffff0000095f56c0 >>>> [ 14.816238] x27: 0000000000000000 x26: ffff000008004000 >>>> [ 14.816239] x25: 00000000015e0000 x24: ffff8008fb916000 >>>> [ 14.816240] x23: 0000000020400005 x22: ffff0000080817cc >>>> [ 14.816241] x21: ffff000008003da0 x20: 0000000000000060 >>>> [ 14.816242] x19: 00000000000003ff x18: ffffffffffffffff >>>> [ 14.816243] x17: 0000000000000008 x16: 003d090000000000 >>>> [ 14.816244] x15: ffff0000095ea6c8 x14: ffff8008fff5ab40 >>>> [ 14.816244] x13: ffff8008fff58b9d x12: 0000000000000000 >>>> [ 14.816245] x11: ffff000008c8a200 x10: 000000008e31fca5 >>>> [ 14.816246] x9 : ffff000008c8a208 x8 : 000000000000000f >>>> [ 14.816247] x7 : 0000000000000004 x6 : ffff8008fff58b9e >>>> [ 14.816248] x5 : 0000000000000000 x4 : 0000000080000000 >>>> [ 14.816249] x3 : 0000000000000000 x2 : 0000000080000000 >>>> [ 14.816250] x1 : 0000000000120000 x0 : ffff0000095f56c0 >>>> [ 14.816251] Call trace: >>>> [ 14.816251] asm_nmi_enter+0x94/0x98 >>>> [ 14.816251] el1_irq+0x8c/0x180 >>>> [ 14.816252] gic_handle_irq+0xbc/0x2e4 >>>> [ 14.816252] el1_irq+0xcc/0x180 >>>> [ 14.816253] arch_timer_handler_virt+0x38/0x58 >>>> [ 14.816253] handle_percpu_devid_irq+0x90/0x240 >>>> [ 14.816253] generic_handle_irq+0x34/0x50 >>>> [ 14.816254] __handle_domain_irq+0x68/0xc0 >>>> [ 14.816254] gic_handle_irq+0xf8/0x2e4 >>>> [ 14.816255] el1_irq+0xcc/0x180 >>>> [ 14.816255] arch_cpu_idle+0x34/0x1c8 >>>> [ 14.816255] default_idle_call+0x24/0x44 >>>> [ 14.816256] do_idle+0x1d0/0x2c8 >>>> [ 14.816256] cpu_startup_entry+0x28/0x30 >>>> [ 14.816256] rest_init+0xb8/0xc8 >>>> [ 14.816257] start_kernel+0x4c8/0x4f4 >>>> [ 14.816257] Code: 940587f1 d5384100 b9401001 36a7fd01 (d4210000) >>>> [ 14.816258] Modules linked in: start_dp(O) smeth(O) >>>> [ 15.103092] ---[ end trace 701753956cb14aa8 ]--- >>>> [ 15.103093] Kernel panic - not syncing: Fatal exception in interrupt >>>> [ 15.103099] SMP: stopping secondary CPUs >>>> [ 15.103100] Kernel Offset: disabled >>>> [ 15.103100] CPU features: 0x36,a2400218 >>>> [ 15.103100] Memory Limit: none >>> Urgh... >>> >>>> Our kernel src tree is based on 4.19.95 and backports arm64 pseudo-NMI >>>> patches but doesn't support nested NMI. Its top relative commit is >>>> commit 17ce302f3117 ("arm64: Fix interrupt tracing in the presence of NMIs"). >>> Can you please reproduce it with mainline and without any backport? >>> It is hard to reason about something that isn't a vanilla kernel. >> I think our kernel is quite like v5.3 mainline. Reproducing it in >> v5.3 mainline may be a little difficult for us because our product >> needs some more self developed patches to work. > I don't really care about 5.3. What I care about is the tip of the > tree, and anything we fix there can trickle down to the previous > stable releases. I don't know if I give the tip of tree. I provide the Fixes tag "Fixes: 17ce302f3117 ("arm64: Fix interrupt tracing in the presence of NMIs")". Is that sufficient to help? > >>>> I look into this issue and find that it's caused by 'BUG_ON(in_nmi())' >>>> in nmi_enter(). From the call trace, we find two 'el1_irqs' which >>>> means an interrupt preempts the other one and the new one is an NMI. >>>> Furthermore, by adding some prints, we find the first irq also calls >>>> nmi_enter(), but its priority is not GICD_INT_NMI_PRI and its irq number >>>> is 1023. It enables irq by calling gic_arch_enable_irqs() in >>>> gic_handle_irq(). At this moment, the second irq preempts the first irq >>>> and it's an NMI but current context is already in nmi. So that may be >>>> the problem. >>> I'm not sure I get it. From the stack trace, I see this: >>> >>> [ 14.816251] asm_nmi_enter+0x94/0x98 >>> [ 14.816251] el1_irq+0x8c/0x180 (C) >>> [ 14.816252] gic_handle_irq+0xbc/0x2e4 >>> [ 14.816252] el1_irq+0xcc/0x180 (B) >>> [ 14.816253] arch_timer_handler_virt+0x38/0x58 >>> [ 14.816253] handle_percpu_devid_irq+0x90/0x240 >>> [ 14.816253] generic_handle_irq+0x34/0x50 >>> [ 14.816254] __handle_domain_irq+0x68/0xc0 >>> [ 14.816254] gic_handle_irq+0xf8/0x2e4 >>> [ 14.816255] el1_irq+0xcc/0x180 (A) >>> >>> which indicates that we preempted a timer interrupt (A) with another >>> IRQ (B), itself immediately preempted by another IRQ (C)? That's >>> indeed at least one too many. >>> >>> Can you please describe for each of (A), (B) and (C) whether they are >>> spurious or not, what their priorities are if they aren't spurious? >> Yes. I ignored interrupt (A). (B) is spurious and its priority is >> 0xa0 and PMR is 0x70. (C) is an NMI and its priority is 0x20. Note >> that GIC_PRIO_IRQON is 0xe0, GIC_PRIO_IRQOFF is 0x60, >> GICD_INT_DEF_PRI is 0xa0 and GICD_INT_NMI_PRI is 0x20 in our kernel. > If (B) is spurious (aka ICC_IAR1R_EL1 return 1023), then its > "priority" doesn't really exist, and I don't really get what you mean > by "its priority is 0xa0". ICC_RPR_EL1 shouldn't change when Ack-ing > a spurious interrupt, because there is no change in GIC state at all. OK. By saying "its priority is 0xa0", I just mean ICC_RPR_EL1 is read as 0xa0. > > And if PMR is 0x70 at the point where you get (B), then I really can't > see how you can get an interrupt of priority 0xa0 anyway. Yes, it also confuses me. Perhaps ICC_RPR_EL1 changes when read, I think. > > > >>>> In my opinion, when handling spurious interrupts, we shouldn't enable irqs. >>>> My reason is that for spurious interrupts we may enter nmi context in >>>> el1_irq() because current PMR may be GIC_PRIO_IRQOFF. If we enable irqs >>>> at this time, another NMI may happen and preempt this spurious interrupt >>>> but the context is already in nmi. That causes a bug on if nested NMI is >>>> not supported. Even for nested nmi, I think it's not a normal scenario. >>> I would tend to agree that this isn't great. Actually, I'd probably >>> move the check for a spurious interrupt right after the read of >>> ICC_IAR1_EL1, because there is no real need to do anything else at >>> that point. >> So, we don't need to check NMI for spurious interrupts? Do you mean >> that spurious interrupts' can't be NMIs? Or even spurious interrups >> are NMIs, we shouldn't do anything for them? If so, I will move the >> check after the read of ICC_IAR1_EL1 and send a V2. > Spurious interrupts are not interrupts at all. It is either a level > interrupt that has been retired before being handled, or some other > transient effect (like an interrupt being moved from one CPU to > another), or even flaky HW. So doing anything based on a spurious > interrupt is definitely a potential bug, and I suggest this: > > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c > index eb0ee356a629..00404024d7cd 100644 > --- a/drivers/irqchip/irq-gic-v3.c > +++ b/drivers/irqchip/irq-gic-v3.c > @@ -648,6 +648,10 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs > > irqnr = gic_read_iar(); > > + /* Check for special IDs first */ > + if ((irqnr >= 1020 && irqnr <= 1023)) > + return; > + > if (gic_supports_nmi() && > unlikely(gic_read_rpr() == GICD_INT_NMI_PRI)) { > gic_handle_nmi(irqnr, regs); > @@ -659,10 +663,6 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs > gic_arch_enable_irqs(); > } > > - /* Check for special IDs first */ > - if ((irqnr >= 1020 && irqnr <= 1023)) > - return; > - > if (static_branch_likely(&supports_deactivate_key)) > gic_write_eoir(irqnr); > else > > [...] Thanks for explaining it. I quite agree with your suggestion. >> In my opinion, since commit 17ce302f3117 ("arm64: Fix interrupt >> tracing in the presence of NMIs"), spurious interrups can enter nmi >> context in interrupt entry because PMR can be GIC_PRIO_IRQOFF for >> spurious interrupts. That means test_irqs_unmasked is not 0 and >> asm_nmi_enter is called. >> >>    (some el1_irq entry code from v5.3) >>    test_irqs_unmasked  res=x0, pmr=x20 >>    cbz x0, 1f >>    bl asm_nmi_enter > That code has significantly changed upstream, and is now in C. I think > it still do the same thing though. Yes, you're right. > >> And it then calls gic_handle_irq(). It doesn't call gic_handle_nmi() >> because its priority is not GICD_INT_NMI_PRI. >> >> Then it enables irqs. If at that point another NMI comes and >> preempts it, which means NMI occurs in nmi context. That may cause a >> bug on if nested NMI is not supported. >> >>    (some gic_handle_irq code from v5.3) >> >>    irqnr = gic_read_iar(); >> >>    if (gic_supports_nmi() && >>        unlikely(gic_read_rpr() == GICD_INT_NMI_PRI)) { >>            gic_handle_nmi(irqnr, >> regs);                                   (C) >>            return; >>    } >> >>    if (gic_prio_masking_enabled()) { >>          gic_pmr_mask_irqs(); >> gic_arch_enable_irqs(); (D) >>    } > I believe the above patch would fix the spurious interrupt issue you > have experienced. Please let me know, and post a v2 if this works for > you. Fortunately, it works. I'll post a v2. Should I cc stable mail list? Thanks.