Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752433AbbF0Ijh (ORCPT ); Sat, 27 Jun 2015 04:39:37 -0400 Received: from mail-wg0-f42.google.com ([74.125.82.42]:33379 "EHLO mail-wg0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752604AbbF0Ij0 (ORCPT ); Sat, 27 Jun 2015 04:39:26 -0400 Date: Sat, 27 Jun 2015 10:39:21 +0200 From: Ingo Molnar To: Prarit Bhargava Cc: linux-kernel@vger.kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , x86@kernel.org, Len Brown , Dasaratharaman Chandramouli , Peter Zijlstra , Borislav Petkov , Andy Lutomirski , Denys Vlasenko , Brian Gerst , Arnaldo Carvalho de Melo Subject: Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr Message-ID: <20150627083921.GA13074@gmail.com> References: <1435341131-3279-1-git-send-email-prarit@redhat.com> <20150627083354.GA12834@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150627083354.GA12834@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6371 Lines: 208 * Ingo Molnar wrote: > So what's wrong with exposing them as a simplified PMU driver? > > That way we only expose the ones we want to - plus tooling can use all the rich > perf features that can be used around this. (sampling, counting, call chains, > etc.) See below code from Andy that exposes a single MSR via perf. At the core of the PMU driver is a single rdmsrl(): +static void aperfmperf_event_start(struct perf_event *event, int flags) +{ + u64 now; + + rdmsrl(event->hw.event_base, now); + local64_set(&event->hw.prev_count, now); +} Now I think what we really want is to expose not a single MSR but multiple MSRs in a single driver, i.e. don't have one PMU driver per MSR, but have a driver that allows the exposure of select MSRs as counters. There should also be a maker/family/model filter mechanism, so that certain MSRs are only exposed on models that are known to support them, etc. Thanks, Ingo ----- Forwarded message from Andy Lutomirski ----- Date: Tue, 28 Apr 2015 14:25:37 -0700 From: Andy Lutomirski To: Len Brown , Peter Zijlstra , "linux-kernel@vger.kernel.org" Cc: Paul Mackerras , Ingo Molnar , Arnaldo Carvalho de Melo , Andy Lutomirski Subject: [RFC] x86, perf: Add an aperfmperf driver Signed-off-by: Andy Lutomirski --- This driver seems a little bit silly, but I can imagine it being useful. For example, I think that turbostat could do some of its work without being root if we had a driver like this. Thoughts? Would it make sense at all? Did I wire it up right? This is the only PMU driver I've ever written, and it could have any number of issues. arch/x86/kernel/cpu/Makefile | 2 + arch/x86/kernel/cpu/perf_event_aperfmperf.c | 119 ++++++++++++++++++++++++++++ 2 files changed, 121 insertions(+) create mode 100644 arch/x86/kernel/cpu/perf_event_aperfmperf.c diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile index 80091ae54c2b..fadc822efc90 100644 --- a/arch/x86/kernel/cpu/Makefile +++ b/arch/x86/kernel/cpu/Makefile @@ -45,6 +45,8 @@ obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE) += perf_event_intel_uncore.o \ perf_event_intel_uncore_snb.o \ perf_event_intel_uncore_snbep.o \ perf_event_intel_uncore_nhmex.o +obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_aperf_mperf.o +obj-$(CONFIG_CPU_SUP_AMD) += perf_event_aperf_mperf.o endif diff --git a/arch/x86/kernel/cpu/perf_event_aperfmperf.c b/arch/x86/kernel/cpu/perf_event_aperfmperf.c new file mode 100644 index 000000000000..6e6d113bd9ce --- /dev/null +++ b/arch/x86/kernel/cpu/perf_event_aperfmperf.c @@ -0,0 +1,119 @@ +#include + +#define APERFMPERF_EVENT_APERF 0 +#define APERFMPERF_EVENT_MPERF 1 + +PMU_EVENT_ATTR_STRING(aperf, evattr_aperf, "event=0x00"); +PMU_EVENT_ATTR_STRING(mperf, evattr_mperf, "event=0x01"); +static struct attribute *events_attrs[] = { + &evattr_aperf.attr.attr, + &evattr_mperf.attr.attr, + NULL, +}; +static struct attribute_group events_attr_group = { + .name = "events", + .attrs = events_attrs, +}; + +PMU_FORMAT_ATTR(event, "config:0-63"); +static struct attribute *format_attrs[] = { + &format_attr_event.attr, + NULL, +}; +static struct attribute_group format_attr_group = { + .name = "format", + .attrs = format_attrs, +}; + +static const struct attribute_group *attr_groups[] = { + &events_attr_group, + &format_attr_group, + NULL, +}; + +static int aperfmperf_event_init(struct perf_event *event) +{ + if (event->attr.type != event->pmu->type) + return -ENOENT; + + if (event->attr.config != APERFMPERF_EVENT_APERF && + event->attr.config != APERFMPERF_EVENT_MPERF) + return -ENOENT; + + if (event->attr.config1 != 0) + return -ENOENT; + + /* no sampling */ + if (event->hw.sample_period) + return -EINVAL; + + /* unsupported modes and filters */ + if (event->attr.exclude_user || + event->attr.exclude_kernel || + event->attr.exclude_hv || + event->attr.exclude_idle || + event->attr.exclude_host || + event->attr.exclude_guest || + event->attr.freq || + event->attr.sample_period) /* no sampling */ + return -EINVAL; + + event->hw.idx = -1; + event->hw.event_base = (event->attr.config == APERFMPERF_EVENT_APERF ? + MSR_IA32_APERF : MSR_IA32_MPERF); + + return 0; +} + +static void aperfmperf_event_update(struct perf_event *event) +{ + u64 prev; + u64 now; + + rdmsrl(event->hw.event_base, now); + prev = local64_xchg(&event->hw.prev_count, now); + local64_add(now - prev, &event->count); +} + +static void aperfmperf_event_start(struct perf_event *event, int flags) +{ + u64 now; + + rdmsrl(event->hw.event_base, now); + local64_set(&event->hw.prev_count, now); +} + +static void aperfmperf_event_stop_or_del(struct perf_event *event, int flags) +{ + aperfmperf_event_update(event); +} + +static int aperfmperf_event_add(struct perf_event *event, int flags) +{ + if (flags & PERF_EF_START) + aperfmperf_event_start(event, flags); + + return 0; +} + +static struct pmu pmu_aperfmperf = { + .task_ctx_nr = perf_invalid_context, + .attr_groups = attr_groups, + .event_init = aperfmperf_event_init, + .add = aperfmperf_event_add, + .del = aperfmperf_event_stop_or_del, + .start = aperfmperf_event_start, + .stop = aperfmperf_event_stop_or_del, + .read = aperfmperf_event_update, +}; + +static int __init aperfmperf_init(void) +{ + if (!boot_cpu_has(X86_FEATURE_APERFMPERF)) + return -ENODEV; + + perf_pmu_register(&pmu_aperfmperf, "aperfmperf", -1); + + return 0; +} +device_initcall(aperfmperf_init); -- 2.3.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ----- End forwarded message ----- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/