Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757500AbbGQMVc (ORCPT ); Fri, 17 Jul 2015 08:21:32 -0400 Received: from foss.arm.com ([217.140.101.70]:48724 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757262AbbGQMVb (ORCPT ); Fri, 17 Jul 2015 08:21:31 -0400 Date: Fri, 17 Jul 2015 13:21:06 +0100 From: Mark Rutland To: "kan.liang@intel.com" Cc: "a.p.zijlstra@chello.nl" , "mingo@redhat.com" , "acme@kernel.org" , "eranian@google.com" , "ak@linux.intel.com" , "adrian.hunter@intel.com" , "dsahern@gmail.com" , "jolsa@kernel.org" , "namhyung@kernel.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH 4/9] perf/x86: special case per-cpu core misc PMU events Message-ID: <20150717122105.GD26091@leverpostej> References: <1437078831-10152-1-git-send-email-kan.liang@intel.com> <1437078831-10152-5-git-send-email-kan.liang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1437078831-10152-5-git-send-email-kan.liang@intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5777 Lines: 152 On Thu, Jul 16, 2015 at 09:33:46PM +0100, kan.liang@intel.com wrote: > From: Kan Liang > > This patch special case per-cpu core_misc PMU events and allow them to > be part of any hardware/software group for system-wide monitoring. > An useful example would be to include the ASTATE/MSTATE event in a > sampling group. This can be used to calculate the frequency during each > sampling period, and track it over time. > > A new context type (perf_free_context) is introduced to indicate these > per-cpu core misc PMU events. They are > - Free running counter > - Don't have any state to switch on context switch and never fails > to schedule > - No sampling support > - Only support system-wide monitoring > - per-cpu > We also defined a new PERF event type PERF_TYPE_CORE_MISC_FREE for them. > > It's safe to mix cpu PMU events and CORE_MISC_FREE events in a group. > Because when cpu PMU events disable/enable, we can disable/enable > them at the same time without failure. Which effectively means you're context-switching their state (given what your enable/disable code does). As with my earlier comments, I don't think these can be grouped with events (not even from the same PMU given their free-running nature). They're CPU-affine, so you can associate them with work done on that CPU. So as far as I can see, you should be able to handle the per-cpu misc events in the perf_hardware_context, providing you reject grouping in your event_init functions. What does this extra context give you? Mark. > Since there is no sampling support for these events. They are only > available for group reading and system-wide monitoring. > > Signed-off-by: Kan Liang > --- > arch/x86/kernel/cpu/perf_event_intel_core_misc.c | 8 +++++--- > include/linux/perf_event.h | 10 ++++++++++ > include/linux/sched.h | 1 + > include/uapi/linux/perf_event.h | 1 + > kernel/events/core.c | 9 +++++++++ > 5 files changed, 26 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kernel/cpu/perf_event_intel_core_misc.c b/arch/x86/kernel/cpu/perf_event_intel_core_misc.c > index 4efe842..dad4495 100644 > --- a/arch/x86/kernel/cpu/perf_event_intel_core_misc.c > +++ b/arch/x86/kernel/cpu/perf_event_intel_core_misc.c > @@ -889,7 +889,6 @@ static void __init core_misc_pmus_register(void) > > type->pmu = (struct pmu) { > .attr_groups = type->pmu_group, > - .task_ctx_nr = perf_invalid_context, > .event_init = core_misc_pmu_event_init, > .add = core_misc_pmu_event_add, /* must have */ > .del = core_misc_pmu_event_del, /* must have */ > @@ -902,9 +901,12 @@ static void __init core_misc_pmus_register(void) > if (type->type == perf_intel_core_misc_thread) { > type->pmu.pmu_disable = (void *) intel_core_misc_pmu_disable; > type->pmu.pmu_enable = (void *) intel_core_misc_pmu_enable; > + type->pmu.task_ctx_nr = perf_free_context; > + err = perf_pmu_register(&type->pmu, type->name, PERF_TYPE_CORE_MISC_FREE); > + } else { > + type->pmu.task_ctx_nr = perf_invalid_context; > + err = perf_pmu_register(&type->pmu, type->name, -1); > } > - > - err = perf_pmu_register(&type->pmu, type->name, -1); > if (WARN_ON(err)) > pr_info("Failed to register PMU %s error %d\n", > type->pmu.name, err); > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > index fea0ddf..3538f1c 100644 > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h > @@ -773,6 +773,16 @@ static inline int is_hardware_event(struct perf_event *event) > return event->pmu->task_ctx_nr == perf_hw_context; > } > > +static inline int is_free_event(struct perf_event *event) > +{ > + return event->pmu->task_ctx_nr == perf_free_context; > +} > + > +static inline int has_context_event(struct perf_event *event) > +{ > + return event->pmu->task_ctx_nr > perf_invalid_context; > +} > + > extern struct static_key perf_swevent_enabled[PERF_COUNT_SW_MAX]; > > extern void ___perf_sw_event(u32, u64, struct pt_regs *, u64); > diff --git a/include/linux/sched.h b/include/linux/sched.h > index ae21f15..717f492 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1335,6 +1335,7 @@ union rcu_special { > struct rcu_node; > > enum perf_event_task_context { > + perf_free_context = -2, > perf_invalid_context = -1, > perf_hw_context = 0, > perf_sw_context, > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h > index d97f84c..232b674 100644 > --- a/include/uapi/linux/perf_event.h > +++ b/include/uapi/linux/perf_event.h > @@ -32,6 +32,7 @@ enum perf_type_id { > PERF_TYPE_HW_CACHE = 3, > PERF_TYPE_RAW = 4, > PERF_TYPE_BREAKPOINT = 5, > + PERF_TYPE_CORE_MISC_FREE = 6, > > PERF_TYPE_MAX, /* non-ABI */ > }; > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 9077867..995b436 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -8019,6 +8019,15 @@ SYSCALL_DEFINE5(perf_event_open, > } > > /* > + * Special case per-cpu free counter events and allow them to be part of > + * any hardware/software group for system-wide monitoring. > + */ > + if (group_leader && !task && > + is_free_event(event) && > + has_context_event(group_leader)) > + pmu = group_leader->pmu; > + > + /* > * Get the target context (task or percpu): > */ > ctx = find_get_context(pmu, task, event); > -- > 1.8.3.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/