Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752780AbbFLLOh (ORCPT ); Fri, 12 Jun 2015 07:14:37 -0400 Received: from mga03.intel.com ([134.134.136.65]:11980 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750893AbbFLLOg (ORCPT ); Fri, 12 Jun 2015 07:14:36 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,601,1427785200"; d="scan'208";a="742079414" Message-ID: <557ABE8B.1020705@intel.com> Date: Fri, 12 Jun 2015 14:12:11 +0300 From: Adrian Hunter Organization: Intel Finland Oy, Registered Address: PL 281, 00181 Helsinki, Business Identity Code: 0357606 - 4, Domiciled in Helsinki User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Peter Zijlstra CC: Andi Kleen , Arnaldo Carvalho de Melo , Ingo Molnar , linux-kernel@vger.kernel.org, Jiri Olsa , Stephane Eranian , mathieu.poirier@linaro.org, Pawel Moll Subject: Re: [RFC PATCH] perf: Add PERF_RECORD_SWITCH to indicate context switches References: <1433859670-10806-1-git-send-email-adrian.hunter@intel.com> <20150611141548.GW19282@twins.programming.kicks-ass.net> In-Reply-To: <20150611141548.GW19282@twins.programming.kicks-ass.net> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2685 Lines: 73 On 11/06/15 17:15, Peter Zijlstra wrote: > On Tue, Jun 09, 2015 at 05:21:10PM +0300, Adrian Hunter wrote: >> Tracepoints are no good at all for non-privileged users >> because they need either CAP_SYS_ADMIN or >> /proc/sys/kernel/perf_event_paranoid <= -1. >> >> On the other hand, kernel software events need either >> CAP_SYS_ADMIN or /proc/sys/kernel/perf_event_paranoid <= 1. > > So while I think it makes sense to allow some tracepoint outside of that > priv level, IOW have a per tracepoint priv level filter thingy, I don't > think sched_switch() is one of those because it explicitly exposes > timing information on other tasks. > >> This new PERF_RECORD_SWITCH event does not have those problems >> and it also has a couple of other small advantages. It is >> easier to use because it is an auxiliary event (like mmap, >> comm and task events) which can be enabled by setting a single >> bit. It is smaller than sched:sched_switch and easier to parse. > > Right, so the one wee problem I have is that this only provides sched_in > data, I imagine people might be interested in sched_out as well. That is not a problem although it would be interesting to know the use-case. To me it seemed unreasonable to expect to analyze scheduler behaviour without admin-level privileges since it is inherently an administrative activity. > > Typically the switch even provides prev and next and thereby is > complete, but since we're limiting it to the one specific task, we'll > not have the sched_out data. That makes sense for completeness, but as I wrote, it would be interesting to know what someone might actually use that for. > >> @@ -812,6 +813,18 @@ enum perf_event_type { >> */ >> PERF_RECORD_ITRACE_START = 12, >> >> + /* >> + * >> + * >> + * struct { >> + * struct perf_event_header header; >> + * u32 pid, tid; >> + * u64 time; > > all 3 are already part of sample_id. You have to decide whether you expect to be able to use an event without sample_id. MMAP and MMAP2 both have pid, tid which are in sample_id, LOST has id, EXIT and FORK have time, all of the THROTTLE/UNTHROTTLE members are in sample_id etc. So it currently looks like we expect to be able to use an event without requiring sample_id. It doesn't affect my case either way because perf tools always sets sample_id_all if it can. > >> + * struct sample_id sample_id; >> + * }; >> + */ >> + PERF_RECORD_SWITCH = 13, > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/