Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755219AbcJSHAT (ORCPT ); Wed, 19 Oct 2016 03:00:19 -0400 Received: from mail-wm0-f68.google.com ([74.125.82.68]:33607 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752401AbcJSHAK (ORCPT ); Wed, 19 Oct 2016 03:00:10 -0400 Subject: Re: [patch] perf_event_open.2: PERF_RECORD_SWITCH support To: Vince Weaver References: Cc: mtk.manpages@gmail.com, linux-man@vger.kernel.org, linux-kernel@vger.kernel.org, Adrian Hunter , Arnaldo Carvalho de Melo , Peter Zijlstra , Ingo Molnar From: "Michael Kerrisk (man-pages)" Message-ID: <15d30ad2-2937-febe-d6c9-b8dded4642d8@gmail.com> Date: Wed, 19 Oct 2016 09:00:07 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4794 Lines: 161 Hi Vince, On 10/18/2016 07:22 PM, Vince Weaver wrote: > > Linux 4.3 introduced two new record types for recording context > switches: PERF_RECORD_SWITCH and PERF_RECORD_SWITCH_CPU_WIDE. > > The advantage over the existing tracepoint and software context > switch events is primarily that full switch in/out data can be > gathered even in the face of restrictive perf_event_paranoid > settings. > > Signed-off-by: Vince Weaver Thanks! Applied. One query below. > diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2 > index 68b99bb..04a0cf5 100644 > --- a/man2/perf_event_open.2 > +++ b/man2/perf_event_open.2 > @@ -243,8 +243,9 @@ struct perf_event_attr { > comm_exec : 1, /* flag comm events that are > due to exec */ > use_clockid : 1, /* use clockid for time fields */ > + context_switch : 1, /* context switch data */ > > - __reserved_1 : 38; > + __reserved_1 : 37; > > union { > __u32 wakeup_events; /* wakeup every n events */ > @@ -1112,6 +1113,21 @@ field. > This can make it easier to correlate perf sample times with > timestamps generated by other tools. > .TP > +.IR "context_switch" " (since Linux 4.3)" > +.\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4 > +This enables the generation of > +.B PERF_RECORD_SWITCH > +records when a context switch occurs. > +It also enables the generation of > +.B PERF_RECORD_SWITCH_CPU_WIDE > +records when sampling in cpu-wide mode. > +This functionality is in addition to existing tracepoint and > +software events for measuring context switches. > +The advantage of this method is that it will give full s/give full/give a full/ ok? > +information event with strict > +.I perf_event_paranoid > +settings. > +.TP > .IR "wakeup_events" ", " "wakeup_watermark" > This union sets how many samples > .RI ( wakeup_events ) > @@ -1792,7 +1808,8 @@ Sample happened in guest user code. > .RE > > .RS > -In addition, one of the following bits can be set: > +The following three statuses are generated by > +different record types so they alias to the same bit: > .TP > .BR PERF_RECORD_MISC_MMAP_DATA " (since Linux 3.10)" > .\" commit 2fe85427e3bf65d791700d065132772fc26e4d75 > @@ -1807,9 +1824,18 @@ record on kernels more recent than Linux 3.16 > if a process name change was caused by an > .BR exec (2) > system call. > -It is an alias for > -.B PERF_RECORD_MISC_MMAP_DATA > -since the two values would not be set in the same record. > +.TP > +.BR PERF_RECORD_MISC_SWITCH_OUT " (since Linux 4.3)" > +.\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4 > +When a > +.BR PERF_RECORD_SWITCH " or " PERF_RECORD_SWITCH_CPU_WIDE > +record is generated this bit indicates that the > +context switch is away from the current process > +(instead of in to the current process). > +.RE > + > +.RS > +In addition, the following bits can be set: > .TP > .B PERF_RECORD_MISC_EXACT_IP > This indicates that the content of > @@ -2583,6 +2609,59 @@ struct { > .I lost > the number of potentially lost samples. > .RE > +.TP > +.BR PERF_RECORD_SWITCH " (since Linux 4.3)" > +\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4 > +This record indicates a context switch has happened. > +The > +.B PERF_RECORD_MISC_SWITCH_OUT > +bit in the > +.I misc > +field indicates whether it was a context switch into > +or away from the current process. > + > +.in +4n > +.nf > +struct { > + struct perf_event_header header; > + struct sample_id sample_id; > +}; > +.fi > +.TP > +.BR PERF_RECORD_SWITCH_CPU_WIDE " (since Linux 4.3)" > +\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4 > +As with > +.B PERF_RECORD_SWITCH > +this record indicates a context switch has happened, > +but it only occurs when sampling in cpu-wide mode > +and provides additional information on the process > +being switched to/from. > +The > +.B PERF_RECORD_MISC_SWITCH_OUT > +bit in the > +.I misc > +field indicates whether it was a context switch into > +or away from the current process. > + > +.in +4n > +.nf > +struct { > + struct perf_event_header header; > + u32 next_prev_pid; > + u32 next_prev_tid; > + struct sample_id sample_id; > +}; > +.fi > +.RS > +.TP > +.I next_prev_pid > +The process id of the previous (if switching in) > +or next (if switching out) process on the CPU. > +.TP > +.I next_prev_tid > +The thread id of the previous (if switching in) > +or next (if switching out) thread on the CPU. > +.RE > .RE > .SS Overflow handling > Events can be set to notify when a threshold is crossed, > Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/