Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161279AbaKNNqA (ORCPT ); Fri, 14 Nov 2014 08:46:00 -0500 Received: from mga11.intel.com ([192.55.52.93]:11428 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161250AbaKNNpy (ORCPT ); Fri, 14 Nov 2014 08:45:54 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.97,862,1389772800"; d="scan'208";a="416495670" From: Alexander Shishkin To: Peter Zijlstra Cc: Ingo Molnar , linux-kernel@vger.kernel.org, Robert Richter , Frederic Weisbecker , Mike Galbraith , Paul Mackerras , Stephane Eranian , Andi Kleen , kan.liang@intel.com, adrian.hunter@intel.com, markus.t.metzger@intel.com, mathieu.poirier@linaro.org, acme@infradead.org, Alexander Shishkin Subject: [PATCH v8 14/14] perf: add ITRACE_START record to indicate that tracing has started Date: Fri, 14 Nov 2014 15:43:47 +0200 Message-Id: <1415972627-37514-15-git-send-email-alexander.shishkin@linux.intel.com> X-Mailer: git-send-email 2.1.1 In-Reply-To: <1415972627-37514-1-git-send-email-alexander.shishkin@linux.intel.com> References: <1415972627-37514-1-git-send-email-alexander.shishkin@linux.intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org For counters that generate AUX data that is bound to the context of a running task, such as instruction tracing, the decoder needs to know exactly which task is running when the event is first scheduled in, before the first sched_switch. The decoder's need to know this stems from the fact that instruction flow trace decoding will almost always require program's object code in order to reconstruct said flow and for that we need at least its pid/tid in the perf stream. To single out such instruction tracing pmus, this patch introduces ITRACE PMU capability. The reason this is not part of RECORD_AUX record is that not all pmus capable of generating AUX data need this, and the opposite is *probably* also true. While sched_switch covers for most cases, there are two problems with it: the consumer will need to process events out of order (that is, having found RECORD_AUX, it will have to skip forward to the nearest sched_switch to figure out which task it was, then go back to the actual trace to decode it) and it completely misses the case when the tracing is enabled and disabled before sched_switch, for example, via PERF_EVENT_IOC_DISABLE. Signed-off-by: Alexander Shishkin --- include/linux/perf_event.h | 4 ++++ include/uapi/linux/perf_event.h | 11 +++++++++++ kernel/events/core.c | 41 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 56 insertions(+) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index f126eb89e6..282721b2df 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -127,6 +127,9 @@ struct hw_perf_event { /* for tp_event->class */ struct list_head tp_list; }; + struct { /* itrace */ + int itrace_started; + }; #ifdef CONFIG_HAVE_HW_BREAKPOINT struct { /* breakpoint */ /* @@ -174,6 +177,7 @@ struct perf_event; #define PERF_PMU_CAP_AUX_NO_SG 0x02 #define PERF_PMU_CAP_AUX_SW_DOUBLEBUF 0x04 #define PERF_PMU_CAP_EXCLUSIVE 0x08 +#define PERF_PMU_CAP_ITRACE 0x10 /** * struct pmu - generic performance monitoring unit diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index ffd52c0861..78017276fe 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -750,6 +750,17 @@ enum perf_event_type { */ PERF_RECORD_AUX = 11, + /* + * Indicates that instruction trace has started + * + * struct { + * struct perf_event_header header; + * u32 pid; + * u32 tid; + * }; + */ + PERF_RECORD_ITRACE_START = 12, + PERF_RECORD_MAX, /* non-ABI */ }; diff --git a/kernel/events/core.c b/kernel/events/core.c index da2e44aedd..988ef2380a 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1733,6 +1733,7 @@ static void perf_set_shadow_time(struct perf_event *event, #define MAX_INTERRUPTS (~0ULL) static void perf_log_throttle(struct perf_event *event, int enable); +static void perf_log_itrace_start(struct perf_event *event); static int event_sched_in(struct perf_event *event, @@ -1767,6 +1768,8 @@ event_sched_in(struct perf_event *event, perf_pmu_disable(event->pmu); + perf_log_itrace_start(event); + if (event->pmu->add(event, PERF_EF_START)) { event->state = PERF_EVENT_STATE_INACTIVE; event->oncpu = -1; @@ -5681,6 +5684,44 @@ static void perf_log_throttle(struct perf_event *event, int enable) perf_output_end(&handle); } +static void perf_log_itrace_start(struct perf_event *event) +{ + struct perf_output_handle handle; + struct perf_sample_data sample; + struct perf_aux_event { + struct perf_event_header header; + u32 pid; + u32 tid; + } rec; + int ret; + + if (event->parent) + event = event->parent; + + if (!(event->pmu->capabilities & PERF_PMU_CAP_ITRACE) || + event->hw.itrace_started) + return; + + event->hw.itrace_started = 1; + + rec.header.type = PERF_RECORD_ITRACE_START; + rec.header.misc = 0; + rec.header.size = sizeof(rec); + rec.pid = perf_event_pid(event, current); + rec.tid = perf_event_tid(event, current); + + perf_event_header__init_id(&rec.header, &sample, event); + ret = perf_output_begin(&handle, event, rec.header.size); + + if (ret) + return; + + perf_output_put(&handle, rec); + perf_event__output_id_sample(event, &handle, &sample); + + perf_output_end(&handle); +} + /* * Generic event overflow handling, sampling. */ -- 2.1.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/