Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753995AbaLBOtV (ORCPT ); Tue, 2 Dec 2014 09:49:21 -0500 Received: from cam-admin0.cambridge.arm.com ([217.140.96.50]:41406 "EHLO cam-admin0.cambridge.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752086AbaLBOtT (ORCPT ); Tue, 2 Dec 2014 09:49:19 -0500 Date: Tue, 2 Dec 2014 14:49:14 +0000 From: Mark Rutland To: Daniel Thompson Cc: Russell King , Will Deacon , "linaro-kernel@lists.linaro.org" , Peter Zijlstra , "patches@linaro.org" , Linus Walleij , "linux-kernel@vger.kernel.org" , Arnaldo Carvalho de Melo , Ingo Molnar , Paul Mackerras , Sascha Hauer , John Stultz , Thomas Gleixner , Shawn Guo , Sumit Semwal , "linux-arm-kernel@lists.infradead.org" , Lucas Stach Subject: Re: [PATCH] arm: perf: Directly handle SMP platforms with one SPI Message-ID: <20141202144914.GK23671@leverpostej> References: <1416581603-30557-1-git-send-email-daniel.thompson@linaro.org> <1417021147-20735-1-git-send-email-daniel.thompson@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1417021147-20735-1-git-send-email-daniel.thompson@linaro.org> Thread-Topic: [PATCH] arm: perf: Directly handle SMP platforms with one SPI Accept-Language: en-GB, en-US Content-Language: en-US User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Daniel, On Wed, Nov 26, 2014 at 04:59:07PM +0000, Daniel Thompson wrote: > Some ARM platforms mux the PMU interrupt of every core into a single > SPI. On such platforms if the PMU of any core except 0 raises an interrupt > then it cannot be serviced and eventually, if you are lucky, the spurious > irq detection might forcefully disable the interrupt. > > On these SoCs it is not possible to determine which core raised the > interrupt so workaround this issue by queuing irqwork on the other > cores whenever the primary interrupt handler is unable to service the > interrupt. > > The u8500 platform has an alternative workaround that dynamically alters > the affinity of the PMU interrupt. This workaround logic is no longer > required so the original code is removed as is the hook it relied upon. I agree that the workaround for this design "feature" should live in the architectural perf code rather than in board files. I had intended to implement that atop of my heterogeneous CPUs series [1] (so as to only target the subset of cores for a given PMU), but I only had an affinity bouncing implementation, and the general approach looks like it has the potential to be far better in terms of latency. Hwoever, I have some concerns with the implementation below. > > Tested on imx6q (which has fours cores/PMUs all muxed to a single SPI). > > Signed-off-by: Daniel Thompson > --- > > Notes: > v2: > * Fixed build problems on systems without SMP. > > v1: > * Thanks to Lucas Stach, Russell King and Thomas Gleixner for > critiquing an older, completely different way to tackle the > same problem. > > > arch/arm/include/asm/pmu.h | 12 ++++ > arch/arm/kernel/perf_event.c | 13 ++-- > arch/arm/kernel/perf_event_cpu.c | 126 +++++++++++++++++++++++++++++++++++++++ > arch/arm/mach-ux500/cpu-db8500.c | 29 --------- > 4 files changed, 143 insertions(+), 37 deletions(-) > > diff --git a/arch/arm/include/asm/pmu.h b/arch/arm/include/asm/pmu.h > index 0b648c541293..771201ff0988 100644 > --- a/arch/arm/include/asm/pmu.h > +++ b/arch/arm/include/asm/pmu.h > @@ -81,6 +81,12 @@ struct pmu_hw_events { > raw_spinlock_t pmu_lock; > }; > > +struct arm_pmu_work { > + struct irq_work work; > + struct arm_pmu *arm_pmu; > + atomic_t ret; > +}; > + > struct arm_pmu { > struct pmu pmu; > cpumask_t active_irqs; > @@ -108,6 +114,12 @@ struct arm_pmu { > u64 max_period; > struct platform_device *plat_device; > struct pmu_hw_events *(*get_hw_events)(void); > +#ifdef CONFIG_SMP > + irqreturn_t (*handle_irq_none)(struct arm_pmu *); I don't think this needs to live on the struct arm_pmu; we're unlikely to need a different callback given this is already generic to CPUs, so we can just call it directly. Once Will's arm-perf-3.19 tag hits mainline (with my perf cleanups series) the CCI will be decoupled [2] and the arm_pmu framework will only be used by CPU PMUs. > + int single_irq; With my heterogeneous PMUs series you shouldn't need this, there will be an irq_map with a single entry instead. > + struct arm_pmu_work __percpu *work; In my cleanups series I folded all the percpu data into the pmu_hw_events. It would be nice to fold this too if possible, so as to make allocation and freeing simpler. We already have a percpu_pmu pointer with my series applied, so we don't need the arm_pmu pointer. If it feels like we're straying too far from hw events data we can rename the structure to pmu_cpu_data or something like that. > + atomic_t remaining_work; > +#endif > }; > > #define to_arm_pmu(p) (container_of(p, struct arm_pmu, pmu)) > diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c > index b50a770f8c99..ba67d6309e1e 100644 > --- a/arch/arm/kernel/perf_event.c > +++ b/arch/arm/kernel/perf_event.c > @@ -306,22 +306,19 @@ validate_group(struct perf_event *event) > static irqreturn_t armpmu_dispatch_irq(int irq, void *dev) > { > struct arm_pmu *armpmu; > - struct platform_device *plat_device; > - struct arm_pmu_platdata *plat; > int ret; > u64 start_clock, finish_clock; > > if (irq_is_percpu(irq)) > dev = *(void **)dev; > armpmu = dev; > - plat_device = armpmu->plat_device; > - plat = dev_get_platdata(&plat_device->dev); > > start_clock = sched_clock(); > - if (plat && plat->handle_irq) > - ret = plat->handle_irq(irq, dev, armpmu->handle_irq); > - else > - ret = armpmu->handle_irq(irq, dev); > + ret = armpmu->handle_irq(irq, dev); > +#ifdef CONFIG_SMP > + if (ret == IRQ_NONE && armpmu->handle_irq_none) > + ret = armpmu->handle_irq_none(dev); > +#endif > finish_clock = sched_clock(); > > perf_sample_event_took(finish_clock - start_clock); > diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c > index eb2c4d55666b..5605d4a4c01f 100644 > --- a/arch/arm/kernel/perf_event_cpu.c > +++ b/arch/arm/kernel/perf_event_cpu.c > @@ -88,6 +88,120 @@ static void cpu_pmu_disable_percpu_irq(void *data) > disable_percpu_irq(irq); > } > > +#ifdef CONFIG_SMP > + > +/* > + * Workaround logic that is distributed to all cores if the PMU has only > + * a single IRQ and the CPU receiving that IRQ cannot handle it. Its > + * job is to try to service the interrupt on the current CPU. It will > + * also enable the IRQ again if all the other CPUs have already tried to > + * service it. > + */ > +static void cpu_pmu_do_percpu_work(struct irq_work *w) > +{ > + struct arm_pmu_work *work = container_of(w, struct arm_pmu_work, work); > + struct arm_pmu *cpu_pmu = work->arm_pmu; > + > + atomic_set(&work->ret, > + cpu_pmu->handle_irq(cpu_pmu->single_irq, cpu_pmu)); > + > + if (atomic_dec_and_test(&cpu_pmu->remaining_work)) > + enable_irq(cpu_pmu->single_irq); > +} > + > +/* > + * This callback, which is enabled only on SMP platforms that are > + * running with a single IRQ, is called when the PMU handler running in > + * the current CPU cannot service the interrupt. > + * > + * It will disable the interrupt and distribute irqwork to all other > + * processors in the system. Hopefully one of them will clear the > + * interrupt... > + */ > +static irqreturn_t cpu_pmu_handle_irq_none(struct arm_pmu *cpu_pmu) > +{ > + int num_online = num_online_cpus(); > + irqreturn_t ret = IRQ_NONE; > + int cpu, cret; s/cret/cpu_ret/ -- it's slightly more typing but easier to read IMO. > + > + if (num_online <= 1) > + return IRQ_NONE; Surely num_online can never be below one here? We didn't get the online CPUs, so can't other CPUs may shut down between the call to num_online_cpus() and portions of the below code? Could that prevent us from re-enabling the interrupt? Or are we protected from that someehow? > + > + disable_irq_nosync(cpu_pmu->single_irq); > + atomic_add(num_online, &cpu_pmu->remaining_work); > + smp_mb__after_atomic(); > + > + for_each_online_cpu(cpu) { > + struct arm_pmu_work *work = per_cpu_ptr(cpu_pmu->work, cpu); > + > + if (cpu == smp_processor_id()) > + continue; > + > + /* > + * We can be extremely relaxed about memory ordering > + * here. All we are doing is gathering information > + * about the past to help us give a return value that > + * will keep the spurious interrupt detector both happy > + * *and* functional. We are not shared so we can > + * tolerate the occasional spurious IRQ_HANDLED. > + */ > + cret = atomic_read(&work->ret); > + if (cret != IRQ_NONE) > + ret = cret; I'm a little confused as to what's going on here. Why are we checking the return code for the work triggered by the _previous_ interrupt before queueing up the next work item? As far as I can tell, work_ret was never initialised, so what does this do for the first round? > + > + if (!irq_work_queue_on(&work->work, cpu)) > + atomic_dec(&cpu_pmu->remaining_work); What context does queued work run in? Will we sample the expected set of registers (e.g. can we sample userspace)? > + } > + > + if (atomic_dec_and_test(&cpu_pmu->remaining_work)) > + enable_irq(cpu_pmu->single_irq); If we initialise remaining work to the number of CPUs minus one, we can drop this last bit, and another CPU will always re-enable the interrupt. > + > + return ret; > +} > + > +static int cpu_pmu_single_irq_workaround_init(struct arm_pmu *cpu_pmu) > +{ > + struct platform_device *pmu_device = cpu_pmu->plat_device; > + int cpu; > + > + cpu_pmu->handle_irq_none = cpu_pmu_handle_irq_none; > + cpu_pmu->single_irq = platform_get_irq(pmu_device, 0); > + > + cpu_pmu->work = alloc_percpu(struct arm_pmu_work); > + if (!cpu_pmu->work) { > + pr_err("no memory for shared IRQ workaround\n"); > + return -ENOMEM; > + } > + > + for_each_possible_cpu(cpu) { > + struct arm_pmu_work *w = per_cpu_ptr(cpu_pmu->work, cpu); > + > + init_irq_work(&w->work, cpu_pmu_do_percpu_work); > + w->arm_pmu = cpu_pmu; > + } > + > + return 0; > +} > + > +static void cpu_pmu_single_irq_workaround_term(struct arm_pmu *cpu_pmu) > +{ > + cpu_pmu->handle_irq_none = cpu_pmu_handle_irq_none; Huh? We set that up identically in cpu_pmu_single_irq_workaround_init. I think the handle_irq_none member should go. > + free_percpu(cpu_pmu->work); > +} > + > +#else /* CONFIG_SMP */ > + > +static int cpu_pmu_single_irq_workaround_init(struct arm_pmu *cpu_pmu) > +{ > + return 0; > +} > + > +static void cpu_pmu_single_irq_workaround_term(struct arm_pmu *cpu_pmu) > +{ > +} > + > +#endif /* CONFIG_SMP */ > + > static void cpu_pmu_free_irq(struct arm_pmu *cpu_pmu) > { > int i, irq, irqs; > @@ -107,6 +221,8 @@ static void cpu_pmu_free_irq(struct arm_pmu *cpu_pmu) > if (irq >= 0) > free_irq(irq, cpu_pmu); > } > + > + cpu_pmu_single_irq_workaround_term(cpu_pmu); > } > } > > @@ -162,6 +278,16 @@ static int cpu_pmu_request_irq(struct arm_pmu *cpu_pmu, irq_handler_t handler) > > cpumask_set_cpu(i, &cpu_pmu->active_irqs); > } > + > + /* > + * If we are running SMP and have only one interrupt source > + * then get ready to share that single irq among the cores. > + */ > + if (nr_cpu_ids > 1 && irqs == 1) { > + err = cpu_pmu_single_irq_workaround_init(cpu_pmu); > + if (err) > + return err; > + } What does this do for systems using PPIs, where there's a single percpu interrupt (e.g. Krait)? Thanks, Mark. [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-November/300748.html [2] https://git.kernel.org/cgit/linux/kernel/git/will/linux.git/tag/?h=perf/updates&id=arm-perf-3.19 [3] https://git.kernel.org/cgit/linux/kernel/git/will/linux.git/commit/?h=perf/updates&id=c6f85cb4305bd80658d19f7b097a7c36ef9912e2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/