Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756181AbcLNQ5p (ORCPT ); Wed, 14 Dec 2016 11:57:45 -0500 Received: from mx1.redhat.com ([209.132.183.28]:42884 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755004AbcLNQ5n (ORCPT ); Wed, 14 Dec 2016 11:57:43 -0500 Date: Wed, 14 Dec 2016 17:50:36 +0100 From: Jiri Olsa To: Peter Zijlstra , Andi Kleen Cc: lkml , Alexander Shishkin , Vince Weaver , Ingo Molnar Subject: [RFC] perf/x86/intel: Account interrupts for PEBS errors Message-ID: <20161214165036.GB9180@krava> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.7.1 (2016-10-04) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Wed, 14 Dec 2016 16:50:38 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4489 Lines: 142 hi, I'm hitting soft lockup generated by fuzzer, where the perf hangs in remote_install path like: NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [perf_fuzzer:5816] task: ffff880273148000 task.stack: ffffc90002d58000 RIP: 0010:[] [] smp_call_function_single+0xe2/0x140 RSP: 0018:ffffc90002d5bd60 EFLAGS: 00000202 ... Call Trace: [] ? trace_hardirqs_on_caller+0xf5/0x1b0 [] ? perf_cgroup_attach+0x70/0x70 [] perf_install_in_context+0x199/0x1b0 [] ? ctx_resched+0x90/0x90 [] SYSC_perf_event_open+0x641/0xf90 [] SyS_perf_event_open+0x9/0x10 [] do_syscall_64+0x6c/0x1f0 [] entry_SYSCALL64_slow_path+0x25/0x25 I found out that I could reproduce this with following 2 perf commands running simultaneously: taskset -c 1 ./perf record -c 4 -e branches:pp -j any -C 10 this forces cpu 10 to endless loop causing the soft lockup AFAICS the reason for this is that intel_pmu_drain_pebs_nhm does not account event->hw.interrupt for error PEBS interrupts, so in case you're getting ONLY errors you dont have a way to stop event when it's over the max_samples_per_tick limit I added extra accounting for error PEBS and it seems to work now, fuzzer is running for several hours now ;-) also I could not reproduce with any other event, just branches plus the additional branch stack sample type I also fail to reproduce on other than snb_x (model 45) server thoughts? thanks, jirka --- diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index be202390bbd3..f2010dbe75d6 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1389,9 +1389,13 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs) continue; /* log dropped samples number */ - if (error[bit]) + if (error[bit]) { perf_log_lost_samples(event, error[bit]); + if (perf_event_account_interrupt(event, 1)) + x86_pmu_stop(event, 0); + } + if (counts[bit]) { __intel_pmu_pebs_event(event, iregs, base, top, bit, counts[bit]); diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 4741ecdb9817..7225396228ce 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1259,6 +1259,7 @@ extern void perf_event_disable(struct perf_event *event); extern void perf_event_disable_local(struct perf_event *event); extern void perf_event_disable_inatomic(struct perf_event *event); extern void perf_event_task_tick(void); +extern int perf_event_account_interrupt(struct perf_event *event, int throttle); #else /* !CONFIG_PERF_EVENTS: */ static inline void * perf_aux_output_begin(struct perf_output_handle *handle, diff --git a/kernel/events/core.c b/kernel/events/core.c index 02c8421f8c01..93b46cc2c977 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7034,25 +7034,11 @@ static void perf_log_itrace_start(struct perf_event *event) perf_output_end(&handle); } -/* - * Generic event overflow handling, sampling. - */ - -static int __perf_event_overflow(struct perf_event *event, - int throttle, struct perf_sample_data *data, - struct pt_regs *regs) +int perf_event_account_interrupt(struct perf_event *event, int throttle) { - int events = atomic_read(&event->event_limit); struct hw_perf_event *hwc = &event->hw; - u64 seq; int ret = 0; - - /* - * Non-sampling counters might still use the PMI to fold short - * hardware counters, ignore those. - */ - if (unlikely(!is_sampling_event(event))) - return 0; + u64 seq; seq = __this_cpu_read(perf_throttled_seq); if (seq != hwc->interrupts_seq) { @@ -7070,6 +7056,30 @@ static int __perf_event_overflow(struct perf_event *event, } } + return ret; +} + +/* + * Generic event overflow handling, sampling. + */ + +static int __perf_event_overflow(struct perf_event *event, + int throttle, struct perf_sample_data *data, + struct pt_regs *regs) +{ + int events = atomic_read(&event->event_limit); + struct hw_perf_event *hwc = &event->hw; + int ret = 0; + + /* + * Non-sampling counters might still use the PMI to fold short + * hardware counters, ignore those. + */ + if (unlikely(!is_sampling_event(event))) + return 0; + + ret = perf_event_account_interrupt(event, throttle); + if (event->attr.freq) { u64 now = perf_clock(); s64 delta = now - hwc->freq_time_stamp;