Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759037AbaGONEH (ORCPT ); Tue, 15 Jul 2014 09:04:07 -0400 Received: from mail-oa0-f54.google.com ([209.85.219.54]:46805 "EHLO mail-oa0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758991AbaGONEF (ORCPT ); Tue, 15 Jul 2014 09:04:05 -0400 MIME-Version: 1.0 In-Reply-To: <53C4F745.3070701@linaro.org> References: <1404118391-3850-1-git-send-email-daniel.thompson@linaro.org> <1404979427-12943-1-git-send-email-daniel.thompson@linaro.org> <53C4F745.3070701@linaro.org> Date: Tue, 15 Jul 2014 15:04:04 +0200 Message-ID: Subject: Re: [PATCH v8 0/4] arm: KGDB NMI/FIQ support From: Harro Haan To: Daniel Thompson Cc: Russell King , linaro-kernel@lists.linaro.org, Catalin Marinas , patches@linaro.org, kgdb-bugreport@lists.sourceforge.net, Linus Walleij , Nicolas Pitre , linux-kernel@vger.kernel.org, Frederic Weisbecker , Anton Vorontsov , Ben Dooks , John Stultz , Fabio Estevam , Colin Cross , kernel-team@android.com, Dave Martin , "linux-arm-kernel@lists.infradead.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 15 July 2014 11:41, Daniel Thompson wrote: > On 14/07/14 14:51, Harro Haan wrote: >> On 10 July 2014 10:03, Daniel Thompson wrote: >>> >>> This patchset makes it possible to use kgdb's NMI infrastructure on ARM >>> platforms. >>> >>> The patches have been previously circulated as part of a large patchset >>> mixing together ARM architecture code and driver changes >>> (http://thread.gmane.org/gmane.linux.ports.arm.kernel/333901 ). This >>> patchset is dramatically cut down to include only the arch/arm code. The >>> driver code (irqchip and tty/serial) will follow when/if the arch code >>> is accepted. >>> >>> The first two patches modify the FIQ infrastructure to allow interrupt >>> controller drivers to register callbacks (the fiq_chip structure) to >>> manage FIQ routings and to ACK and EOI the FIQ. This makes it possible >>> to use FIQ in multi-platform kernels and with recent ARM interrupt >>> controllers. >>> >> >> Daniel, >> >> Thanks for the patches. I have tested the fiq and irq-gic patches on >> an i.MX6 (SabreSD board) for a different purpose: >> A FIQ timer interrupt at 1 kHz. The TWD watchdog block is used in >> timer mode for this (interrupt ID 30). The GIC affinity is set to CPU >> core 0 only for this interrupt ID. >> >> I was surprised by the following behavior: >> $ cat /proc/interrupts >> CPU0 CPU1 CPU2 CPU3 >> 29: 5459 3381 3175 3029 GIC 29 twd >> 30: 11 0 0 0 GIC 30 fake-fiq >> >> Once every few seconds is interrupt ID 30 handled by the regular GIC >> handler instead of the FIQ exception path (which causes for example a >> bit more latencies). The other thousands of FIQ's are handled by the >> FIQ exception path. The problem is also confirmed by the following >> stackframe of the Lauterbach tooling: >> fake_fiq_handler(irq = 30, ...) >> handle_percpu_devid_irq(irq = 30, ...) >> generic_handle_irq(irq = 30) >> handle_IRQ(irq = 30, ...) >> gic_handle_irq >> __irq_svc >> -->exception >> >> Notes: >> - The problem will occur more often by executing the following command: >> $ while true; do hackbench 20; done & >> - Reading the GIC_CPU_INTACK register returns the interrupt ID of the >> highest priority pending interrupt. >> - The GIC_CPU_INTACK register (used by fiq_ack) will return spurious >> interrupt ID 0x3FF when read in fake_fiq_handler, because >> GIC_CPU_INTACK is already read by gic_handle_irq. >> - The FIQ exception will not occur anymore after gic_handle_irq >> read/acknowledges the FIQ by reading the GIC_CPU_INTACK register >> >> From the behavior above I conclude that the GIC_CPU_INTACK register is >> first updated before the FIQ exception is generated, so there is a >> small period of time that gic_handle_irq can read/acknowledge the FIQ. > > Makes sense. > > Avoiding this problem on GICv2 is easy (thanks to the aliased intack > register) but on iMX.6 we have only a GICv1. > > >> I can reduce the number of occurrences (not prevent it) by adding the >> following hack to irq-gic.c >> @@ -297,10 +309,12 @@ static asmlinkage void __exception_irq_entry >> gic_handle_irq(struct pt_regs *regs >> u32 irqstat, irqnr; >> struct gic_chip_data *gic = &gic_data[0]; >> void __iomem *cpu_base = gic_data_cpu_base(gic); >> >> do { >> + while(readl_relaxed(gic_data_dist_base(gic) + GIC_DIST_PENDING_SET) >> & (1 << 30)) >> + printk(KERN_ERR "TEMP: gic_handle_irq: wait for FIQ exception\n"); >> irqstat = readl_relaxed(cpu_base + GIC_CPU_INTACK); >> irqnr = irqstat & ~0x1c00; > > I've made a more complete attempt to fix this. Could you test the > following? (and be prepared to fuzz the line numbers) Thanks Daniel, I have tested it, see the comments below. > > diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c > index 73ae896..309bf2c 100644 > --- a/drivers/irqchip/irq-gic.c > +++ b/drivers/irqchip/irq-gic.c > @@ -303,6 +303,28 @@ static int gic_set_wake(struct irq_data *d, > unsigned int on) > #define gic_set_wake NULL > #endif > > +/* Check for group 0 interrupt spuriously acked as a normal IRQ. This > + * workaround will only work for level triggered interrupts (and in > + * its current form is actively harmful on systems that don't support > + * FIQ). > + */ > +static u32 gic_handle_spurious_group_0(struct gic_chip_data *gic, u32 > irqstat) > +{ > + u32 irqnr = irqstat & GICC_IAR_INT_ID_MASK; > + > + if (!gic_data_fiq_enable(gic) || irqnr >= 1021) > + return irqnr; > + > + if (readl_relaxed(gic_data_dist_base(gic) + GIC_DIST_IGROUP + > + (irqnr / 32 * 4)) & > + BIT(irqnr % 32)) > + return irqnr; > + > + /* this interrupt was spurious (needs to be handled as FIQ) */ > + writel_relaxed(irqstat, gic_data_cpu_base(gic) + GIC_CPU_EOI); This will NOT work, because of the note I mentioned above: "The FIQ exception will not occur anymore after gic_handle_irq read/acknowledges the FIQ by reading the GIC_CPU_INTACK register" So with this code you will say End Of Interrupt at the GIC level, without actually handling the interrupt, so you are missing an interrupt. I did the following test to confirm the missing interrupt: I have changed the periodic timer interrupt by an one-shot timer interrupt. The one-shot timer interrupt is programmed by the FIQ handler for the next FIQ interrupt. As expected: When the problem occurs, no more FIQ interrupts are generated. > + return 1023; > +} > + > static void __exception_irq_entry gic_handle_irq(struct pt_regs *regs) > { > u32 irqstat, irqnr; > @@ -310,8 +332,10 @@ static void __exception_irq_entry > gic_handle_irq(struct pt_regs *regs) > void __iomem *cpu_base = gic_data_cpu_base(gic); > > do { > + local_fiq_disable(); It is a bit weird to disable the "Non-Maskable Interrupt" at every interrupt, because of a problem that occurs once every few thousand interrupts > irqstat = readl_relaxed(cpu_base + GIC_CPU_INTACK); > - irqnr = irqstat & GICC_IAR_INT_ID_MASK; > + irqnr = gic_handle_spurious_group_0(gic, irqstat); > + local_fiq_enable(); > > if (likely(irqnr > 15 && irqnr < 1021)) { > irqnr = irq_find_mapping(gic->domain, irqnr); The following type of changes could fix the problem for me: @@ -290,19 +290,66 @@ static int gic_set_wake(struct irq_data *d, unsigned int on) #else #define gic_set_wake NULL #endif +extern void (*fiq_handler)(void); + +/* Check for group 0 interrupt spuriously acked as a normal IRQ. This + * workaround will only work for level triggered interrupts (and in + * its current form is actively harmful on systems that don't support + * FIQ). + */ +static u32 gic_handle_spurious_group_0(struct gic_chip_data *gic, u32 irqstat) +{ + u32 irqnr = irqstat & ~0x1c00; + + if (!gic_data_fiq_enable(gic) || irqnr >= 1021) + return irqnr; + + if (readl_relaxed(gic_data_dist_base(gic) + GIC_DIST_IGROUP + + (irqnr / 32 * 4)) & BIT(irqnr % 32)) + return irqnr; + + /* + * The FIQ should be disabled before the next FIQ interrupt occurs, + * so this only works when the next FIQ interrupt is not "too fast" + * after the previous one. + */ + local_fiq_disable(); + + /* + * Notes: + * - The FIQ exception will not occur anymore for this current + * interrupt, because gic_handle_irq has already read/acknowledged + * the GIC_CPU_INTACK register. So handle the FIQ here. + * - The fiq_handler below should not call ack_fiq and eoi_fiq, + * because rereading the GIC_CPU_INTACK register returns spurious + * interrupt ID 0x3FF. So probably you will have to add sometime like + * the following to fiq_handler: + * u32 cpsr, irqstat; + * __asm__("mrs %0, cpsr" : "=r" (cpsr)); + * if ((cpsr & MODE_MASK) == FIQ_MODE) + * irqstat = ack_fiq(); + */ + (*fiq_handler)(); + + writel_relaxed(irqstat, gic_data_cpu_base(gic) + GIC_CPU_EOI); + local_fiq_enable(); + + return 1023; +} + static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs) { u32 irqstat, irqnr; struct gic_chip_data *gic = &gic_data[0]; void __iomem *cpu_base = gic_data_cpu_base(gic); do { irqstat = readl_relaxed(cpu_base + GIC_CPU_INTACK); - irqnr = irqstat & ~0x1c00; + irqnr = gic_handle_spurious_group_0(gic, irqstat); if (likely(irqnr > 15 && irqnr < 1021)) { Regards, Harro -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/