Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751806AbdFISW0 (ORCPT ); Fri, 9 Jun 2017 14:22:26 -0400 Received: from mga09.intel.com ([134.134.136.24]:40896 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751758AbdFISWY (ORCPT ); Fri, 9 Jun 2017 14:22:24 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.39,319,1493708400"; d="scan'208";a="112519842" From: kan.liang@intel.com To: peterz@infradead.org, mingo@redhat.com, eranian@google.com, linux-kernel@vger.kernel.org Cc: alexander.shishkin@linux.intel.com, acme@redhat.com, jolsa@redhat.com, torvalds@linux-foundation.org, tglx@linutronix.de, vincent.weaver@maine.edu, ak@linux.intel.com, Kan Liang Subject: [PATCH V2 2/2] perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86 Date: Fri, 9 Jun 2017 10:28:03 -0700 Message-Id: <1497029283-3332-2-git-send-email-kan.liang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1497029283-3332-1-git-send-email-kan.liang@intel.com> References: <1497029283-3332-1-git-send-email-kan.liang@intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3124 Lines: 91 From: Kan Liang The NMI watchdog uses either the fixed cycles or a generic cycles counter. This causes a lot of conflicts with users of the PMU who want to run a full group including the cycles fixed counter, for example the --topdown support recently added to perf stat. The code needs to fall back to not use groups, which can cause measurement inaccuracy due to multiplexing errors. This patch switches the NMI watchdog to use reference cycles on Intel systems. This is actually more accurate than cycles, because cycles can tick faster than the measured CPU Frequency due to Turbo mode. The ref cycles always tick at their frequency, or slower when the system is idling. That means the NMI watchdog can never expire too early, unlike with cycles. The reference cycles tick roughly at the frequency of the TSC, so the same period computation can be used. For older platform like Silvermont/Airmont, Core2 and Atom, don't do the switch. Their NMI watchdog still use cycles event. Signed-off-by: Andi Kleen --- Changes since V1: - Don't use ref-cycles NMI watchdog in older platform. arch/x86/events/core.c | 10 ++++++++++ include/linux/nmi.h | 1 + kernel/watchdog_hld.c | 7 +++++++ 3 files changed, 18 insertions(+) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 18f8d37..e4c9f11 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -2625,3 +2625,13 @@ void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap) cap->events_mask_len = x86_pmu.events_mask_len; } EXPORT_SYMBOL_GPL(perf_get_x86_pmu_capability); + +#ifdef CONFIG_HARDLOCKUP_DETECTOR +int hw_nmi_get_event(void) +{ + if ((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) && + (x86_pmu.ref_cycles_rep)) + return PERF_COUNT_HW_REF_CPU_CYCLES; + return PERF_COUNT_HW_CPU_CYCLES; +} +#endif diff --git a/include/linux/nmi.h b/include/linux/nmi.h index aa3cd08..b2fa444 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -141,6 +141,7 @@ static inline bool trigger_single_cpu_backtrace(int cpu) #ifdef CONFIG_LOCKUP_DETECTOR u64 hw_nmi_get_sample_period(int watchdog_thresh); +int hw_nmi_get_event(void); extern int nmi_watchdog_enabled; extern int soft_watchdog_enabled; extern int watchdog_user_enabled; diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c index 54a427d..f899766 100644 --- a/kernel/watchdog_hld.c +++ b/kernel/watchdog_hld.c @@ -70,6 +70,12 @@ void touch_nmi_watchdog(void) } EXPORT_SYMBOL(touch_nmi_watchdog); +/* Can be overridden by architecture */ +__weak int hw_nmi_get_event(void) +{ + return PERF_COUNT_HW_CPU_CYCLES; +} + static struct perf_event_attr wd_hw_attr = { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES, @@ -165,6 +171,7 @@ int watchdog_nmi_enable(unsigned int cpu) wd_attr = &wd_hw_attr; wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh); + wd_attr->config = hw_nmi_get_event(); /* Try to register using hardware perf events */ event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL); -- 2.7.4