Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755020AbbGQDsj (ORCPT ); Thu, 16 Jul 2015 23:48:39 -0400 Received: from mga11.intel.com ([192.55.52.93]:34073 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753284AbbGQDsd (ORCPT ); Thu, 16 Jul 2015 23:48:33 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.15,492,1432623600"; d="scan'208";a="607841821" From: kan.liang@intel.com To: a.p.zijlstra@chello.nl Cc: mingo@redhat.com, acme@kernel.org, eranian@google.com, ak@linux.intel.com, mark.rutland@arm.com, adrian.hunter@intel.com, dsahern@gmail.com, jolsa@kernel.org, namhyung@kernel.org, linux-kernel@vger.kernel.org, Kan Liang Subject: [PATCH 1/9] perf/x86: Add Intel core misc PMUs support Date: Thu, 16 Jul 2015 16:33:43 -0400 Message-Id: <1437078831-10152-2-git-send-email-kan.liang@intel.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1437078831-10152-1-git-send-email-kan.liang@intel.com> References: <1437078831-10152-1-git-send-email-kan.liang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 33013 Lines: 1103 From: Kan Liang There are miscellaneous free running (read-only) counters in core. These counters may be used simultaneously by other tools, such as turbostat. However, it still make sense to implement them in perf. Because we can conveniently collect them together with other events, and allow to use them from tools without special MSR access code. Furthermore, the handling of the free running counters is very different, so it makes sense to put them into a separate pmu. These counters include TSC, IA32_APERF, IA32_MPERF, IA32_PPERF, SMI_COUNT, CORE_C*_RESIDENCY and PKG_C*_RESIDENCY. This patch adds new PMUs to support these counters, including helper functions that add/delete events. According to counters' scope and category, three PMUs are registered with the perf_event core subsystem. - 'core_misc': The counter is available for each logical processor. The counter include TSC, IA32_APERF, IA32_MPERF, IA32_PPERF and SMI_COUNT. - 'power_core': The counter is available for each processor core. The counter include CORE_C*_RESIDENCY, which is power related. - 'power_pkg': The counter is available for each physical package. The counter include PKG_C*_RESIDENCY, which is power related. The events are exposed in sysfs for use by perf stat and other tools. The files are: /sys/devices/core_misc/events/power-aperf /sys/devices/core_misc/events/power-mperf /sys/devices/core_misc/events/power-pperf /sys/devices/core_misc/events/smi-count /sys/devices/core_misc/events/tsc /sys/devices/power_core/events/c*-residency /sys/devices/power_pkg/events/c*-residency These events only support system-wide mode counting. For power_core/power_pkg, measuring only one CPU per core/socket is sufficient. The /sys/devices/power_*/cpumask file can be used by tools to figure out which CPUs to monitor by default. The PMU type (attr->type) is dynamically allocated and is available from /sys/devices/core_misc/type and /sys/device/power_*/type. Sampling is not supported. Here are some examples. 1. To caculate the CPU% CPU_Utilization = CPU_CLK_UNHALTED.REF_TSC / TSC $ perf stat -x, -e "ref-cycles,core_misc/tsc/" -C0 taskset -c 0 sleep 1 3481579,,ref-cycles 2301685567,,core_misc/tsc/ The CPU% for sleep is 0.15%. $ perf stat -x, -e "ref-cycles,core_misc/tsc/" -C0 taskset -c 0 busyloop 11924042536,,ref-cycles 11929411840,,core_misc/tsc/ The CPU% for busyloop is 99.95% 2. To caculate the fraction of time when the core is running in C6 state CORE_C6_time% = CORE_C6_RESIDENCY / TSC $ perf stat -x, -e"power_core/c6-residency/,core_misc/tsc/" -C0 -- taskset -c 0 sleep 1 2287199396,,power_core/c6-residency/ 2297755875,,core_misc/tsc/ For sleep, 99.5% of time run in C6 state. $ perf stat -x, -e"power_core/c6-residency/,core_misc/tsc/" -C0 -- taskset -c 0 busyloop 1330044,,power_core/c6-residency/ 9932928928,,core_misc/tsc/ For busyloop, 0.01% of time run in C6 state. Signed-off-by: Kan Liang --- arch/x86/kernel/cpu/Makefile | 1 + arch/x86/kernel/cpu/perf_event_intel_core_misc.c | 890 +++++++++++++++++++++++ arch/x86/kernel/cpu/perf_event_intel_core_misc.h | 96 +++ 3 files changed, 987 insertions(+) create mode 100644 arch/x86/kernel/cpu/perf_event_intel_core_misc.c create mode 100644 arch/x86/kernel/cpu/perf_event_intel_core_misc.h diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile index 9bff687..a516820 100644 --- a/arch/x86/kernel/cpu/Makefile +++ b/arch/x86/kernel/cpu/Makefile @@ -41,6 +41,7 @@ obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_p6.o perf_event_knc.o perf_event_p4.o obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_intel_lbr.o perf_event_intel_ds.o perf_event_intel.o obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_intel_rapl.o perf_event_intel_cqm.o obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_intel_pt.o perf_event_intel_bts.o +obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_intel_core_misc.o obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE) += perf_event_intel_uncore.o \ perf_event_intel_uncore_snb.o \ diff --git a/arch/x86/kernel/cpu/perf_event_intel_core_misc.c b/arch/x86/kernel/cpu/perf_event_intel_core_misc.c new file mode 100644 index 0000000..c6c82ac --- /dev/null +++ b/arch/x86/kernel/cpu/perf_event_intel_core_misc.c @@ -0,0 +1,890 @@ +/* + * perf_event_intel_core_misc.c: support miscellaneous core counters + * + * Copyright (C) 2015, Intel Corp. + * Author: Kan Liang (kan.liang@intel.com) + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Library General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Library General Public License for more details. + * + */ + +/* + * This file export misc free running (read-only) counters in the core + * for perf. These counters may be use simultaneously by other tools, + * such as turbostat. However, it still make sense to implement them + * in perf. Because we can conveniently collect them together with + * other events, and allow to use them from tools without special MSR + * access code. + * + * The events only support system-wide mode counting. There is no + * sampling support because it is not supported by the hardware. + * + * All of these counters are specified in the IntelĀ® 64 and IA-32 + * Architectures Software Developer.s Manual Vol3b. + * + * Architectural counters: + * TSC: time-stamp counter (Section 17.13) + * perf code: 0x05 + * APERF: Actual Performance Clock Counter (Section 14.1) + * perf code: 0x01 + * MPERF: TSC Frequency Clock Counter (Section 14.1) + * perf code: 0x02 + * + * Model specific counters: + * PPERF: Productive Performance Count. (See Section 14.4.5.1) + * perf code: 0x03 + * Available model: SLM server + * SMI_COUNT: SMI Counter + * perf code: 0x04 + * Available model: SLM,AMT,NHM,WSM,SNB,IVB,HSW,BDW + * MSR_CORE_C1_RES: CORE C1 Residency Counter + * perf code: 0x06 + * Available model: SLM,AMT + * Scope: Core (each processor core has a MSR) + * MSR_CORE_C3_RESIDENCY: CORE C3 Residency Counter + * perf code: 0x07 + * Available model: NHM,WSM,SNB,IVB,HSW,BDW + * Scope: Core + * MSR_CORE_C6_RESIDENCY: CORE C6 Residency Counter + * perf code: 0x08 + * Available model: SLM,AMT,NHM,WSM,SNB,IVB,HSW,BDW + * Scope: Core + * MSR_CORE_C7_RESIDENCY: CORE C7 Residency Counter + * perf code: 0x09 + * Available model: SNB,IVB,HSW,BDW + * Scope: Core + * MSR_PKG_C2_RESIDENCY: Package C2 Residency Counter. + * perf code: 0x0a + * Available model: SNB,IVB,HSW,BDW + * Scope: Package (physical package) + * MSR_PKG_C3_RESIDENCY: Package C3 Residency Counter. + * perf code: 0x0b + * Available model: NHM,WSM,SNB,IVB,HSW,BDW + * Scope: Package (physical package) + * MSR_PKG_C6_RESIDENCY: Package C6 Residency Counter. + * perf code: 0x0c + * Available model: NHM,WSM,SNB,IVB,HSW,BDW + * Scope: Package (physical package) + * MSR_PKG_C7_RESIDENCY: Package C7 Residency Counter. + * perf code: 0x0d + * Available model: NHM,WSM,SNB,IVB,HSW,BDW + * Scope: Package (physical package) + * MSR_PKG_C8_RESIDENCY: Package C8 Residency Counter. + * perf code: 0x0e + * Available model: HSW ULT only + * Scope: Package (physical package) + * MSR_PKG_C9_RESIDENCY: Package C9 Residency Counter. + * perf code: 0x0f + * Available model: HSW ULT only + * Scope: Package (physical package) + * MSR_PKG_C10_RESIDENCY: Package C10 Residency Counter. + * perf code: 0x10 + * Available model: HSW ULT only + * Scope: Package (physical package) + * MSR_SLM_PKG_C6_RESIDENCY: Package C6 Residency Counter for SLM. + * perf code: 0x11 + * Available model: SLM,AMT + * Scope: Package (physical package) + * + */ + +#include "perf_event_intel_core_misc.h" + +static struct intel_core_misc_type *empty_core_misc[] = { NULL, }; +struct intel_core_misc_type **core_misc = empty_core_misc; + +static struct perf_core_misc_event_msr core_misc_events[] = { + { PERF_POWER_APERF, MSR_IA32_APERF }, + { PERF_POWER_MPERF, MSR_IA32_MPERF }, + { PERF_POWER_PPERF, MSR_PPERF }, + { PERF_SMI_COUNT, MSR_SMI_COUNT}, + { PERF_TSC, 0 }, + { PERF_POWER_CORE_C1_RES, MSR_CORE_C1_RES }, + { PERF_POWER_CORE_C3_RES, MSR_CORE_C3_RESIDENCY }, + { PERF_POWER_CORE_C6_RES, MSR_CORE_C6_RESIDENCY }, + { PERF_POWER_CORE_C7_RES, MSR_CORE_C7_RESIDENCY }, + { PERF_POWER_PKG_C2_RES, MSR_PKG_C2_RESIDENCY }, + { PERF_POWER_PKG_C3_RES, MSR_PKG_C3_RESIDENCY }, + { PERF_POWER_PKG_C6_RES, MSR_PKG_C6_RESIDENCY }, + { PERF_POWER_PKG_C7_RES, MSR_PKG_C7_RESIDENCY }, + { PERF_POWER_PKG_C8_RES, MSR_PKG_C8_RESIDENCY }, + { PERF_POWER_PKG_C9_RES, MSR_PKG_C9_RESIDENCY }, + { PERF_POWER_PKG_C10_RES, MSR_PKG_C10_RESIDENCY }, + { PERF_POWER_SLM_PKG_C6_RES, MSR_PKG_C7_RESIDENCY }, +}; + +EVENT_ATTR_STR(power-aperf, power_aperf, "event=0x01"); +EVENT_ATTR_STR(power-mperf, power_mperf, "event=0x02"); +EVENT_ATTR_STR(power-pperf, power_pperf, "event=0x03"); +EVENT_ATTR_STR(smi-count, smi_count, "event=0x04"); +EVENT_ATTR_STR(tsc, clock_tsc, "event=0x05"); +EVENT_ATTR_STR(c1-residency, power_core_c1_res, "event=0x06"); +EVENT_ATTR_STR(c3-residency, power_core_c3_res, "event=0x07"); +EVENT_ATTR_STR(c6-residency, power_core_c6_res, "event=0x08"); +EVENT_ATTR_STR(c7-residency, power_core_c7_res, "event=0x09"); +EVENT_ATTR_STR(c2-residency, power_pkg_c2_res, "event=0x0a"); +EVENT_ATTR_STR(c3-residency, power_pkg_c3_res, "event=0x0b"); +EVENT_ATTR_STR(c6-residency, power_pkg_c6_res, "event=0x0c"); +EVENT_ATTR_STR(c7-residency, power_pkg_c7_res, "event=0x0d"); +EVENT_ATTR_STR(c8-residency, power_pkg_c8_res, "event=0x0e"); +EVENT_ATTR_STR(c9-residency, power_pkg_c9_res, "event=0x0f"); +EVENT_ATTR_STR(c10-residency, power_pkg_c10_res, "event=0x10"); +EVENT_ATTR_STR(c6-residency, power_slm_pkg_c6_res, "event=0x11"); + +static cpumask_t core_misc_core_cpu_mask; +static cpumask_t core_misc_pkg_cpu_mask; + +static DEFINE_PER_CPU(struct core_misc_pmu *, core_misc_pmu); +static DEFINE_PER_CPU(struct core_misc_pmu *, core_misc_pmu_to_free); +static DEFINE_PER_CPU(struct core_misc_pmu *, core_misc_core_pmu); +static DEFINE_PER_CPU(struct core_misc_pmu *, core_misc_core_pmu_to_free); +static DEFINE_PER_CPU(struct core_misc_pmu *, core_misc_pkg_pmu); +static DEFINE_PER_CPU(struct core_misc_pmu *, core_misc_pkg_pmu_to_free); + +#define __GET_CORE_MISC_PMU_RETURN(core_misc_pmu) \ +{ \ + pmu = per_cpu(core_misc_pmu, event->cpu); \ + if (pmu && (pmu->pmu->type == event->pmu->type)) \ + return pmu; \ +} +static struct core_misc_pmu *get_core_misc_pmu(struct perf_event *event) +{ + struct core_misc_pmu *pmu; + + __GET_CORE_MISC_PMU_RETURN(core_misc_pmu); + __GET_CORE_MISC_PMU_RETURN(core_misc_core_pmu); + __GET_CORE_MISC_PMU_RETURN(core_misc_pkg_pmu); + + return NULL; +} + +static int core_misc_pmu_event_init(struct perf_event *event) +{ + u64 cfg = event->attr.config & CORE_MISC_EVENT_MASK; + int ret = 0; + + if (event->attr.type != event->pmu->type) + return -ENOENT; + + /* + * check event is known (determines counter) + */ + if (!cfg || (cfg >= PERF_CORE_MISC_EVENT_MAX)) + return -EINVAL; + + /* unsupported modes and filters */ + if (event->attr.exclude_user || + event->attr.exclude_kernel || + event->attr.exclude_hv || + event->attr.exclude_idle || + event->attr.exclude_host || + event->attr.exclude_guest || + event->attr.sample_period) /* no sampling */ + return -EINVAL; + + /* must be done before validate_group */ + event->hw.event_base = core_misc_events[cfg-1].msr; + event->hw.config = cfg; + event->hw.idx = core_misc_events[cfg-1].id; + + return ret; +} + +static u64 core_misc_pmu_read_counter(struct perf_event *event) +{ + struct hw_perf_event *hwc = &event->hw; + u64 val; + + if (hwc->idx == PERF_TSC) + val = rdtsc(); + else + rdmsrl_safe(event->hw.event_base, &val); + return val; +} + +static void core_misc_pmu_event_update(struct perf_event *event) +{ + struct hw_perf_event *hwc = &event->hw; + u64 prev_raw_count, new_raw_count; + s64 delta; + int shift = 0; + + if (hwc->idx == PERF_SMI_COUNT) + shift = 32; +again: + prev_raw_count = local64_read(&hwc->prev_count); + new_raw_count = core_misc_pmu_read_counter(event); + + if (local64_cmpxchg(&hwc->prev_count, prev_raw_count, + new_raw_count) != prev_raw_count) { + cpu_relax(); + goto again; + } + + delta = (new_raw_count << shift) - (prev_raw_count << shift); + delta >>= shift; + + local64_add(delta, &event->count); +} + +static void __core_misc_pmu_event_start(struct core_misc_pmu *pmu, + struct perf_event *event) +{ + if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED))) + return; + + event->hw.state = 0; + list_add_tail(&event->active_entry, &pmu->active_list); + local64_set(&event->hw.prev_count, core_misc_pmu_read_counter(event)); + pmu->n_active++; +} + +static void core_misc_pmu_event_start(struct perf_event *event, int mode) +{ + struct core_misc_pmu *pmu = get_core_misc_pmu(event); + unsigned long flags; + + if (pmu == NULL) + return; + + spin_lock_irqsave(&pmu->lock, flags); + __core_misc_pmu_event_start(pmu, event); + spin_unlock_irqrestore(&pmu->lock, flags); +} + +static void core_misc_pmu_event_stop(struct perf_event *event, int mode) +{ + struct core_misc_pmu *pmu = get_core_misc_pmu(event); + struct hw_perf_event *hwc = &event->hw; + unsigned long flags; + + if (pmu == NULL) + return; + + spin_lock_irqsave(&pmu->lock, flags); + + /* mark event as deactivated and stopped */ + if (!(hwc->state & PERF_HES_STOPPED)) { + WARN_ON_ONCE(pmu->n_active <= 0); + pmu->n_active--; + + list_del(&event->active_entry); + + WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED); + hwc->state |= PERF_HES_STOPPED; + } + + /* check if update of sw counter is necessary */ + if ((mode & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) { + /* + * Drain the remaining delta count out of a event + * that we are disabling: + */ + core_misc_pmu_event_update(event); + hwc->state |= PERF_HES_UPTODATE; + } + spin_unlock_irqrestore(&pmu->lock, flags); +} + +static void core_misc_pmu_event_del(struct perf_event *event, int mode) +{ + core_misc_pmu_event_stop(event, PERF_EF_UPDATE); +} + +static int core_misc_pmu_event_add(struct perf_event *event, int mode) +{ + struct core_misc_pmu *pmu = get_core_misc_pmu(event); + struct hw_perf_event *hwc = &event->hw; + unsigned long flags; + + if (pmu == NULL) + return -EINVAL; + + spin_lock_irqsave(&pmu->lock, flags); + + hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED; + + if (mode & PERF_EF_START) + __core_misc_pmu_event_start(pmu, event); + + spin_unlock_irqrestore(&pmu->lock, flags); + + return 0; +} + +static ssize_t core_misc_get_attr_cpumask(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct pmu *pmu = dev_get_drvdata(dev); + struct intel_core_misc_type *type; + int i; + + for (i = 0; core_misc[i]; i++) { + type = core_misc[i]; + if (type->pmu.type == pmu->type) { + switch (type->type) { + case perf_intel_core_misc_core: + return cpumap_print_to_pagebuf(true, buf, &core_misc_core_cpu_mask); + case perf_intel_core_misc_pkg: + return cpumap_print_to_pagebuf(true, buf, &core_misc_pkg_cpu_mask); + default: + return 0; + } + } + } + + return 0; +} + +static DEVICE_ATTR(cpumask, S_IRUGO, core_misc_get_attr_cpumask, NULL); + +static struct attribute *core_misc_pmu_attrs[] = { + &dev_attr_cpumask.attr, + NULL, +}; + +static struct attribute_group core_misc_pmu_attr_group = { + .attrs = core_misc_pmu_attrs, +}; + +DEFINE_CORE_MISC_FORMAT_ATTR(event, event, "config:0-7"); +static struct attribute *core_misc_formats_attr[] = { + &format_attr_event.attr, + NULL, +}; + +static struct attribute_group core_misc_pmu_format_group = { + .name = "format", + .attrs = core_misc_formats_attr, +}; + +static struct attribute *nhm_core_misc_events_attr[] = { + EVENT_PTR(power_aperf), + EVENT_PTR(power_mperf), + EVENT_PTR(smi_count), + EVENT_PTR(clock_tsc), + NULL, +}; + +static struct attribute_group nhm_core_misc_pmu_events_group = { + .name = "events", + .attrs = nhm_core_misc_events_attr, +}; + +const struct attribute_group *nhm_core_misc_attr_groups[] = { + &core_misc_pmu_format_group, + &nhm_core_misc_pmu_events_group, + NULL, +}; + +static struct intel_core_misc_type nhm_core_misc = { + .name = "core_misc", + .type = perf_intel_core_misc_thread, + .pmu_group = nhm_core_misc_attr_groups, +}; + +static struct attribute *nhm_power_core_events_attr[] = { + EVENT_PTR(power_core_c3_res), + EVENT_PTR(power_core_c6_res), + NULL, +}; + +static struct attribute_group nhm_power_core_pmu_events_group = { + .name = "events", + .attrs = nhm_power_core_events_attr, +}; + +const struct attribute_group *nhm_power_core_attr_groups[] = { + &core_misc_pmu_attr_group, + &core_misc_pmu_format_group, + &nhm_power_core_pmu_events_group, + NULL, +}; + +static struct intel_core_misc_type nhm_power_core = { + .name = "power_core", + .type = perf_intel_core_misc_pkg, + .pmu_group = nhm_power_core_attr_groups, +}; + +static struct attribute *nhm_power_pkg_events_attr[] = { + EVENT_PTR(power_pkg_c3_res), + EVENT_PTR(power_pkg_c6_res), + EVENT_PTR(power_pkg_c7_res), + NULL, +}; + +static struct attribute_group nhm_power_pkg_pmu_events_group = { + .name = "events", + .attrs = nhm_power_pkg_events_attr, +}; + +const struct attribute_group *nhm_power_pkg_attr_groups[] = { + &core_misc_pmu_attr_group, + &core_misc_pmu_format_group, + &nhm_power_pkg_pmu_events_group, + NULL, +}; + +static struct intel_core_misc_type nhm_power_pkg = { + .name = "power_pkg", + .type = perf_intel_core_misc_pkg, + .pmu_group = nhm_power_pkg_attr_groups, +}; + +static struct intel_core_misc_type *nhm_core_misc_types[] = { + &nhm_core_misc, + &nhm_power_core, + &nhm_power_pkg, +}; + +static struct attribute *slm_power_core_events_attr[] = { + EVENT_PTR(power_core_c1_res), + EVENT_PTR(power_core_c6_res), + NULL, +}; + +static struct attribute_group slm_power_core_pmu_events_group = { + .name = "events", + .attrs = slm_power_core_events_attr, +}; + +const struct attribute_group *slm_power_core_attr_groups[] = { + &core_misc_pmu_attr_group, + &core_misc_pmu_format_group, + &slm_power_core_pmu_events_group, + NULL, +}; + +static struct intel_core_misc_type slm_power_core = { + .name = "power_core", + .type = perf_intel_core_misc_pkg, + .pmu_group = slm_power_core_attr_groups, +}; + +static struct attribute *slm_power_pkg_events_attr[] = { + EVENT_PTR(power_slm_pkg_c6_res), + NULL, +}; + +static struct attribute_group slm_power_pkg_pmu_events_group = { + .name = "events", + .attrs = slm_power_pkg_events_attr, +}; + +const struct attribute_group *slm_power_pkg_attr_groups[] = { + &core_misc_pmu_attr_group, + &core_misc_pmu_format_group, + &slm_power_pkg_pmu_events_group, + NULL, +}; + +static struct intel_core_misc_type slm_power_pkg = { + .name = "power_pkg", + .type = perf_intel_core_misc_pkg, + .pmu_group = slm_power_pkg_attr_groups, +}; + +static struct intel_core_misc_type *slm_core_misc_types[] = { + &nhm_core_misc, + &slm_power_core, + &slm_power_pkg, +}; + +static struct attribute *slm_s_core_misc_events_attr[] = { + EVENT_PTR(power_aperf), + EVENT_PTR(power_mperf), + EVENT_PTR(power_pperf), + EVENT_PTR(smi_count), + EVENT_PTR(clock_tsc), + NULL, +}; + +static struct attribute_group slm_s_core_misc_pmu_events_group = { + .name = "events", + .attrs = slm_s_core_misc_events_attr, +}; + +const struct attribute_group *slm_s_core_misc_attr_groups[] = { + &core_misc_pmu_format_group, + &slm_s_core_misc_pmu_events_group, + NULL, +}; + +static struct intel_core_misc_type slm_s_core_misc = { + .name = "core_misc", + .type = perf_intel_core_misc_thread, + .pmu_group = slm_s_core_misc_attr_groups, +}; + +static struct intel_core_misc_type *slm_s_core_misc_types[] = { + &slm_s_core_misc, + &slm_power_core, + &slm_power_pkg, +}; + +static struct attribute *snb_power_core_events_attr[] = { + EVENT_PTR(power_core_c3_res), + EVENT_PTR(power_core_c6_res), + EVENT_PTR(power_core_c7_res), + NULL, +}; + +static struct attribute_group snb_power_core_pmu_events_group = { + .name = "events", + .attrs = snb_power_core_events_attr, +}; + +const struct attribute_group *snb_power_core_attr_groups[] = { + &core_misc_pmu_attr_group, + &core_misc_pmu_format_group, + &snb_power_core_pmu_events_group, + NULL, +}; + +static struct intel_core_misc_type snb_power_core = { + .name = "power_core", + .type = perf_intel_core_misc_core, + .pmu_group = snb_power_core_attr_groups, +}; + +static struct attribute *snb_power_pkg_events_attr[] = { + EVENT_PTR(power_pkg_c2_res), + EVENT_PTR(power_pkg_c3_res), + EVENT_PTR(power_pkg_c6_res), + EVENT_PTR(power_pkg_c7_res), + NULL, +}; + +static struct attribute_group snb_power_pkg_pmu_events_group = { + .name = "events", + .attrs = snb_power_pkg_events_attr, +}; + +const struct attribute_group *snb_power_pkg_attr_groups[] = { + &core_misc_pmu_attr_group, + &core_misc_pmu_format_group, + &snb_power_pkg_pmu_events_group, + NULL, +}; + +static struct intel_core_misc_type snb_power_pkg = { + .name = "power_pkg", + .type = perf_intel_core_misc_pkg, + .pmu_group = snb_power_pkg_attr_groups, +}; + +static struct intel_core_misc_type *snb_core_misc_types[] = { + &nhm_core_misc, + &snb_power_core, + &snb_power_pkg, + NULL, +}; + +static struct attribute *hsw_ult_power_pkg_events_attr[] = { + EVENT_PTR(power_pkg_c2_res), + EVENT_PTR(power_pkg_c3_res), + EVENT_PTR(power_pkg_c6_res), + EVENT_PTR(power_pkg_c7_res), + EVENT_PTR(power_pkg_c8_res), + EVENT_PTR(power_pkg_c9_res), + EVENT_PTR(power_pkg_c10_res), + NULL, +}; + +static struct attribute_group hsw_ult_power_pkg_pmu_events_group = { + .name = "events", + .attrs = hsw_ult_power_pkg_events_attr, +}; + +const struct attribute_group *hsw_ult_power_pkg_attr_groups[] = { + &core_misc_pmu_attr_group, + &core_misc_pmu_format_group, + &hsw_ult_power_pkg_pmu_events_group, + NULL, +}; + +static struct intel_core_misc_type hsw_ult_power_pkg = { + .name = "power_pkg", + .type = perf_intel_core_misc_pkg, + .pmu_group = hsw_ult_power_pkg_attr_groups, +}; + +static struct intel_core_misc_type *hsw_ult_core_misc_types[] = { + &nhm_core_misc, + &snb_power_core, + &hsw_ult_power_pkg, +}; + +#define __CORE_MISC_CPU_EXIT(_type, _cpu_mask, fn) \ +{ \ + pmu = per_cpu(core_misc_ ## _type, cpu); \ + if (pmu) { \ + id = fn(cpu); \ + target = -1; \ + for_each_online_cpu(i) { \ + if (i == cpu) \ + continue; \ + if (id == fn(i)) { \ + target = i; \ + break; \ + } \ + } \ + if (cpumask_test_and_clear_cpu(cpu, &core_misc_ ## _cpu_mask) && target >= 0) \ + cpumask_set_cpu(target, &core_misc_ ## _cpu_mask); \ + WARN_ON(cpumask_empty(&core_misc_ ## _cpu_mask)); \ + if (target >= 0) \ + perf_pmu_migrate_context(pmu->pmu, cpu, target); \ + } \ +} + +static void core_misc_cpu_exit(int cpu) +{ + struct core_misc_pmu *pmu; + int i, id, target; + + __CORE_MISC_CPU_EXIT(core_pmu, core_cpu_mask, topology_core_id); + __CORE_MISC_CPU_EXIT(pkg_pmu, pkg_cpu_mask, topology_physical_package_id); +} + +#define __CORE_MISC_CPU_INIT(_type, _cpu_mask, fn) \ +{ \ + pmu = per_cpu(core_misc_ ## _type, cpu); \ + if (pmu) { \ + id = fn(cpu); \ + for_each_cpu(i, &core_misc_ ## _cpu_mask) { \ + if (id == fn(i)) \ + break; \ + } \ + if (i >= nr_cpu_ids) \ + cpumask_set_cpu(cpu, &core_misc_ ## _cpu_mask); \ + } \ +} + +static void core_misc_cpu_init(int cpu) +{ + int i, id; + struct core_misc_pmu *pmu; + + __CORE_MISC_CPU_INIT(core_pmu, core_cpu_mask, topology_core_id); + __CORE_MISC_CPU_INIT(pkg_pmu, pkg_cpu_mask, topology_physical_package_id); +} + +#define __CORE_MISC_CPU_PREPARE(core_misc_pmu, type) \ +{ \ + pmu = per_cpu(core_misc_pmu, cpu); \ + if (pmu) \ + break; \ + pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu)); \ + spin_lock_init(&pmu->lock); \ + INIT_LIST_HEAD(&pmu->active_list); \ + pmu->pmu = &type->pmu; \ + per_cpu(core_misc_pmu, cpu) = pmu; \ +} + +static int core_misc_cpu_prepare(int cpu) +{ + struct core_misc_pmu *pmu; + struct intel_core_misc_type *type; + int i; + + for (i = 0; core_misc[i]; i++) { + type = core_misc[i]; + + switch (type->type) { + case perf_intel_core_misc_thread: + __CORE_MISC_CPU_PREPARE(core_misc_pmu, type) + break; + case perf_intel_core_misc_core: + __CORE_MISC_CPU_PREPARE(core_misc_core_pmu, type); + break; + case perf_intel_core_misc_pkg: + __CORE_MISC_CPU_PREPARE(core_misc_pkg_pmu, type); + break; + } + } + + return 0; +} + +#define __CORE_MISC_CPU_KREE(pmu_to_free) \ +{ \ + if (per_cpu(pmu_to_free, cpu)) { \ + kfree(per_cpu(pmu_to_free, cpu)); \ + per_cpu(pmu_to_free, cpu) = NULL; \ + } \ +} + +static void core_misc_cpu_kfree(int cpu) +{ + __CORE_MISC_CPU_KREE(core_misc_pmu_to_free); + __CORE_MISC_CPU_KREE(core_misc_core_pmu_to_free); + __CORE_MISC_CPU_KREE(core_misc_pkg_pmu_to_free); +} + +#define __CORE_MISC_CPU_DYING(pmu, pmu_to_free) \ +{ \ + if (per_cpu(pmu, cpu)) { \ + per_cpu(pmu_to_free, cpu) = per_cpu(pmu, cpu); \ + per_cpu(pmu, cpu) = NULL; \ + } \ +} + +static int core_misc_cpu_dying(int cpu) +{ + __CORE_MISC_CPU_DYING(core_misc_pmu, core_misc_pmu_to_free); + __CORE_MISC_CPU_DYING(core_misc_core_pmu, core_misc_core_pmu_to_free); + __CORE_MISC_CPU_DYING(core_misc_pkg_pmu, core_misc_pkg_pmu_to_free); + + return 0; +} +static int core_misc_cpu_notifier(struct notifier_block *self, + unsigned long action, void *hcpu) +{ + unsigned int cpu = (long)hcpu; + + switch (action & ~CPU_TASKS_FROZEN) { + case CPU_UP_PREPARE: + core_misc_cpu_prepare(cpu); + break; + case CPU_STARTING: + core_misc_cpu_init(cpu); + break; + case CPU_UP_CANCELED: + case CPU_DYING: + core_misc_cpu_dying(cpu); + break; + case CPU_ONLINE: + case CPU_DEAD: + core_misc_cpu_kfree(cpu); + break; + case CPU_DOWN_PREPARE: + core_misc_cpu_exit(cpu); + break; + default: + break; + } + + return NOTIFY_OK; +} + +#define CORE_MISC_CPU(_model, _ops) { \ + .vendor = X86_VENDOR_INTEL, \ + .family = 6, \ + .model = _model, \ + .driver_data = (kernel_ulong_t)&_ops, \ + } + +static const struct x86_cpu_id core_misc_ids[] __initconst = { + CORE_MISC_CPU(0x37, slm_core_misc_types),/* Silvermont */ + CORE_MISC_CPU(0x4d, slm_s_core_misc_types),/* Silvermont Avoton/Rangely */ + CORE_MISC_CPU(0x4c, slm_core_misc_types),/* Airmont */ + CORE_MISC_CPU(0x1e, nhm_core_misc_types),/* Nehalem */ + CORE_MISC_CPU(0x1a, nhm_core_misc_types),/* Nehalem-EP */ + CORE_MISC_CPU(0x2e, nhm_core_misc_types),/* Nehalem-EX */ + CORE_MISC_CPU(0x25, nhm_core_misc_types),/* Westmere */ + CORE_MISC_CPU(0x2c, nhm_core_misc_types),/* Westmere-EP */ + CORE_MISC_CPU(0x2f, nhm_core_misc_types),/* Westmere-EX */ + CORE_MISC_CPU(0x2a, snb_core_misc_types),/* SandyBridge */ + CORE_MISC_CPU(0x2d, snb_core_misc_types),/* SandyBridge-E/EN/EP */ + CORE_MISC_CPU(0x3a, snb_core_misc_types),/* IvyBridge */ + CORE_MISC_CPU(0x3e, snb_core_misc_types),/* IvyBridge-EP/EX */ + CORE_MISC_CPU(0x3c, snb_core_misc_types),/* Haswell Core */ + CORE_MISC_CPU(0x3f, snb_core_misc_types),/* Haswell Server */ + CORE_MISC_CPU(0x46, snb_core_misc_types),/* Haswell + GT3e */ + CORE_MISC_CPU(0x45, hsw_ult_core_misc_types),/* Haswell ULT */ + CORE_MISC_CPU(0x3d, snb_core_misc_types),/* Broadwell Core-M */ + CORE_MISC_CPU(0x56, snb_core_misc_types),/* Broadwell Xeon D */ + CORE_MISC_CPU(0x47, snb_core_misc_types),/* Broadwell + GT3e */ + CORE_MISC_CPU(0x4f, snb_core_misc_types),/* Broadwell Server */ + {} +}; + +static int __init core_misc_init(void) +{ + const struct x86_cpu_id *id; + + id = x86_match_cpu(core_misc_ids); + if (!id) + return -ENODEV; + + core_misc = (struct intel_core_misc_type **)id->driver_data; + + return 0; +} + +static void __init core_misc_cpumask_init(void) +{ + int cpu, err; + + cpu_notifier_register_begin(); + + for_each_online_cpu(cpu) { + err = core_misc_cpu_prepare(cpu); + if (err) { + pr_info(" CPU prepare failed\n"); + cpu_notifier_register_done(); + return; + } + core_misc_cpu_init(cpu); + } + + __perf_cpu_notifier(core_misc_cpu_notifier); + + cpu_notifier_register_done(); +} + +static void __init core_misc_pmus_register(void) +{ + struct intel_core_misc_type *type; + int i, err; + + for (i = 0; core_misc[i]; i++) { + type = core_misc[i]; + + type->pmu = (struct pmu) { + .attr_groups = type->pmu_group, + .task_ctx_nr = perf_invalid_context, + .event_init = core_misc_pmu_event_init, + .add = core_misc_pmu_event_add, /* must have */ + .del = core_misc_pmu_event_del, /* must have */ + .start = core_misc_pmu_event_start, + .stop = core_misc_pmu_event_stop, + .read = core_misc_pmu_event_update, + .capabilities = PERF_PMU_CAP_NO_INTERRUPT, + }; + + err = perf_pmu_register(&type->pmu, type->name, -1); + if (WARN_ON(err)) + pr_info("Failed to register PMU %s error %d\n", + type->pmu.name, err); + } +} + +static int __init core_misc_pmu_init(void) +{ + int err; + + if (cpu_has_hypervisor) + return -ENODEV; + + err = core_misc_init(); + if (err) + return err; + + core_misc_cpumask_init(); + + core_misc_pmus_register(); + + return 0; +} +device_initcall(core_misc_pmu_init); diff --git a/arch/x86/kernel/cpu/perf_event_intel_core_misc.h b/arch/x86/kernel/cpu/perf_event_intel_core_misc.h new file mode 100644 index 0000000..0ed66e4 --- /dev/null +++ b/arch/x86/kernel/cpu/perf_event_intel_core_misc.h @@ -0,0 +1,96 @@ +/* + * Copyright (C) 2015, Intel Corp. + * Author: Kan Liang (kan.liang@intel.com) + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Library General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Library General Public License for more details. + * + */ + +#ifndef __PERF_EVENT_INTEL_CORE_MISC_H +#define __PERF_EVENT_INTEL_CORE_MISC_H + +#include +#include +#include +#include +#include "perf_event.h" + +#define CORE_MISC_HRTIMER_INTERVAL (60LL * NSEC_PER_SEC) + +struct intel_core_misc_type { + struct pmu pmu; + const char *name; + int type; + const struct attribute_group **pmu_group; +}; + +enum perf_intel_core_misc_type { + perf_intel_core_misc_thread = 0, + perf_intel_core_misc_core, + perf_intel_core_misc_pkg, +}; + +struct perf_core_misc_event_msr { + int id; + u64 msr; +}; + +enum perf_core_misc_id { + /* + * core_misc events, generalized by the kernel: + */ + PERF_POWER_APERF = 1, + PERF_POWER_MPERF = 2, + PERF_POWER_PPERF = 3, + PERF_SMI_COUNT = 4, + PERF_TSC = 5, + PERF_POWER_CORE_C1_RES = 6, + PERF_POWER_CORE_C3_RES = 7, + PERF_POWER_CORE_C6_RES = 8, + PERF_POWER_CORE_C7_RES = 9, + PERF_POWER_PKG_C2_RES = 10, + PERF_POWER_PKG_C3_RES = 11, + PERF_POWER_PKG_C6_RES = 12, + PERF_POWER_PKG_C7_RES = 13, + PERF_POWER_PKG_C8_RES = 14, + PERF_POWER_PKG_C9_RES = 15, + PERF_POWER_PKG_C10_RES = 16, + PERF_POWER_SLM_PKG_C6_RES = 17, + + PERF_CORE_MISC_EVENT_MAX, /* non-ABI */ +}; + +/* + * event code: LSB 8 bits, passed in attr->config + * any other bit is reserved + */ +#define CORE_MISC_EVENT_MASK 0xFFULL + +#define DEFINE_CORE_MISC_FORMAT_ATTR(_var, _name, _format) \ +static ssize_t __core_misc_##_var##_show(struct kobject *kobj, \ + struct kobj_attribute *attr, \ + char *page) \ +{ \ + BUILD_BUG_ON(sizeof(_format) >= PAGE_SIZE); \ + return snprintf(page, sizeof(_format) + 2, _format "\n"); \ +} \ +static struct kobj_attribute format_attr_##_var = \ + __ATTR(_name, 0444, __core_misc_##_var##_show, NULL) + + +struct core_misc_pmu { + spinlock_t lock; + int n_active; + struct list_head active_list; + struct intel_core_misc_type *core_misc_type; + struct pmu *pmu; +}; +#endif -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/