Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932231AbbGUOzQ (ORCPT ); Tue, 21 Jul 2015 10:55:16 -0400 Received: from mail-wg0-f54.google.com ([74.125.82.54]:34050 "EHLO mail-wg0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752630AbbGUOzO (ORCPT ); Tue, 21 Jul 2015 10:55:14 -0400 From: Matt Fleming To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Will Auld , Matt Fleming , Thomas Gleixner , Vikas Shivappa , Kanaka Juvva , Subject: [RESEND][PATCH] perf/x86/intel/cqm: Return cached counter value from IRQ context Date: Tue, 21 Jul 2015 15:55:09 +0100 Message-Id: <1437490509-15373-1-git-send-email-matt@codeblueprint.co.uk> X-Mailer: git-send-email 2.1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4984 Lines: 101 From: Matt Fleming Peter reported the following potential crash which I was able to reproduce with his test program, [ 148.765788] ------------[ cut here ]------------ [ 148.765796] WARNING: CPU: 34 PID: 2840 at kernel/smp.c:417 smp_call_function_many+0xb6/0x260() [ 148.765797] Modules linked in: [ 148.765800] CPU: 34 PID: 2840 Comm: perf Not tainted 4.2.0-rc1+ #4 [ 148.765803] ffffffff81cdc398 ffff88085f105950 ffffffff818bdfd5 0000000000000007 [ 148.765805] 0000000000000000 ffff88085f105990 ffffffff810e413a 0000000000000000 [ 148.765807] ffffffff82301080 0000000000000022 ffffffff8107f640 ffffffff8107f640 [ 148.765809] Call Trace: [ 148.765810] [] dump_stack+0x45/0x57 [ 148.765818] [] warn_slowpath_common+0x8a/0xc0 [ 148.765822] [] ? intel_cqm_stable+0x60/0x60 [ 148.765824] [] ? intel_cqm_stable+0x60/0x60 [ 148.765825] [] warn_slowpath_null+0x1a/0x20 [ 148.765827] [] smp_call_function_many+0xb6/0x260 [ 148.765829] [] ? intel_cqm_stable+0x60/0x60 [ 148.765831] [] on_each_cpu_mask+0x28/0x60 [ 148.765832] [] intel_cqm_event_count+0x7f/0xe0 [ 148.765836] [] perf_output_read+0x2a5/0x400 [ 148.765839] [] perf_output_sample+0x31a/0x590 [ 148.765840] [] ? perf_prepare_sample+0x26d/0x380 [ 148.765841] [] perf_event_output+0x47/0x60 [ 148.765843] [] __perf_event_overflow+0x215/0x240 [ 148.765844] [] perf_event_overflow+0x14/0x20 [ 148.765847] [] intel_pmu_handle_irq+0x1d4/0x440 [ 148.765849] [] ? __perf_event_task_sched_in+0x36/0xa0 [ 148.765853] [] ? vunmap_page_range+0x19d/0x2f0 [ 148.765854] [] ? unmap_kernel_range_noflush+0x11/0x20 [ 148.765859] [] ? ghes_copy_tofrom_phys+0x11e/0x2a0 [ 148.765863] [] ? native_apic_msr_write+0x2b/0x30 [ 148.765865] [] ? x2apic_send_IPI_self+0x1d/0x20 [ 148.765869] [] ? arch_irq_work_raise+0x35/0x40 [ 148.765872] [] ? irq_work_queue+0x66/0x80 [ 148.765875] [] perf_event_nmi_handler+0x26/0x40 [ 148.765877] [] nmi_handle+0x79/0x100 [ 148.765879] [] default_do_nmi+0x42/0x100 [ 148.765880] [] do_nmi+0x83/0xb0 [ 148.765884] [] end_repeat_nmi+0x1e/0x2e [ 148.765886] [] ? __perf_event_task_sched_in+0x36/0xa0 [ 148.765888] [] ? __perf_event_task_sched_in+0x36/0xa0 [ 148.765890] [] ? __perf_event_task_sched_in+0x36/0xa0 [ 148.765891] <> [] finish_task_switch+0x156/0x210 [ 148.765898] [] __schedule+0x341/0x920 [ 148.765899] [] schedule+0x37/0x80 [ 148.765903] [] ? do_page_fault+0x2f/0x80 [ 148.765905] [] schedule_user+0x1a/0x50 [ 148.765907] [] retint_careful+0x14/0x32 [ 148.765908] ---[ end trace e33ff2be78e14901 ]--- The CQM task events are not safe to be called from within interrupt context because they require performing an IPI to read the counter value on all sockets. And performing IPIs from within IRQ context is a "no-no". Make do with the last read counter value currently event in event->count when we're invoked in this context. Reported-by: Peter Zijlstra Cc: Thomas Gleixner Cc: Vikas Shivappa Cc: Kanaka Juvva Cc: Signed-off-by: Matt Fleming --- Apologies for the spam, I fat-fingered Will's email address. arch/x86/kernel/cpu/perf_event_intel_cqm.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/cpu/perf_event_intel_cqm.c b/arch/x86/kernel/cpu/perf_event_intel_cqm.c index 188076161c1b..63eb68b73589 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_cqm.c +++ b/arch/x86/kernel/cpu/perf_event_intel_cqm.c @@ -952,6 +952,14 @@ static u64 intel_cqm_event_count(struct perf_event *event) return 0; /* + * Getting up-to-date values requires an SMP IPI which is not + * possible if we're being called in interrupt context. Return + * the cached values instead. + */ + if (unlikely(in_interrupt())) + goto out; + + /* * Notice that we don't perform the reading of an RMID * atomically, because we can't hold a spin lock across the * IPIs. -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/