Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S940420AbcJRRWf (ORCPT ); Tue, 18 Oct 2016 13:22:35 -0400 Received: from mail-io0-f193.google.com ([209.85.223.193]:36045 "EHLO mail-io0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933658AbcJRRWZ (ORCPT ); Tue, 18 Oct 2016 13:22:25 -0400 From: Vince Weaver X-Google-Original-From: Vince Weaver Date: Tue, 18 Oct 2016 13:22:20 -0400 (EDT) X-X-Sender: vince@macbook-air To: "Michael Kerrisk (man-pages)" cc: linux-man@vger.kernel.org, linux-kernel@vger.kernel.org, Adrian Hunter , Arnaldo Carvalho de Melo , Peter Zijlstra , Ingo Molnar Subject: [patch] perf_event_open.2: PERF_RECORD_SWITCH support Message-ID: User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4220 Lines: 140 Linux 4.3 introduced two new record types for recording context switches: PERF_RECORD_SWITCH and PERF_RECORD_SWITCH_CPU_WIDE. The advantage over the existing tracepoint and software context switch events is primarily that full switch in/out data can be gathered even in the face of restrictive perf_event_paranoid settings. Signed-off-by: Vince Weaver diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2 index 68b99bb..04a0cf5 100644 --- a/man2/perf_event_open.2 +++ b/man2/perf_event_open.2 @@ -243,8 +243,9 @@ struct perf_event_attr { comm_exec : 1, /* flag comm events that are due to exec */ use_clockid : 1, /* use clockid for time fields */ + context_switch : 1, /* context switch data */ - __reserved_1 : 38; + __reserved_1 : 37; union { __u32 wakeup_events; /* wakeup every n events */ @@ -1112,6 +1113,21 @@ field. This can make it easier to correlate perf sample times with timestamps generated by other tools. .TP +.IR "context_switch" " (since Linux 4.3)" +.\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4 +This enables the generation of +.B PERF_RECORD_SWITCH +records when a context switch occurs. +It also enables the generation of +.B PERF_RECORD_SWITCH_CPU_WIDE +records when sampling in cpu-wide mode. +This functionality is in addition to existing tracepoint and +software events for measuring context switches. +The advantage of this method is that it will give full +information event with strict +.I perf_event_paranoid +settings. +.TP .IR "wakeup_events" ", " "wakeup_watermark" This union sets how many samples .RI ( wakeup_events ) @@ -1792,7 +1808,8 @@ Sample happened in guest user code. .RE .RS -In addition, one of the following bits can be set: +The following three statuses are generated by +different record types so they alias to the same bit: .TP .BR PERF_RECORD_MISC_MMAP_DATA " (since Linux 3.10)" .\" commit 2fe85427e3bf65d791700d065132772fc26e4d75 @@ -1807,9 +1824,18 @@ record on kernels more recent than Linux 3.16 if a process name change was caused by an .BR exec (2) system call. -It is an alias for -.B PERF_RECORD_MISC_MMAP_DATA -since the two values would not be set in the same record. +.TP +.BR PERF_RECORD_MISC_SWITCH_OUT " (since Linux 4.3)" +.\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4 +When a +.BR PERF_RECORD_SWITCH " or " PERF_RECORD_SWITCH_CPU_WIDE +record is generated this bit indicates that the +context switch is away from the current process +(instead of in to the current process). +.RE + +.RS +In addition, the following bits can be set: .TP .B PERF_RECORD_MISC_EXACT_IP This indicates that the content of @@ -2583,6 +2609,59 @@ struct { .I lost the number of potentially lost samples. .RE +.TP +.BR PERF_RECORD_SWITCH " (since Linux 4.3)" +\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4 +This record indicates a context switch has happened. +The +.B PERF_RECORD_MISC_SWITCH_OUT +bit in the +.I misc +field indicates whether it was a context switch into +or away from the current process. + +.in +4n +.nf +struct { + struct perf_event_header header; + struct sample_id sample_id; +}; +.fi +.TP +.BR PERF_RECORD_SWITCH_CPU_WIDE " (since Linux 4.3)" +\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4 +As with +.B PERF_RECORD_SWITCH +this record indicates a context switch has happened, +but it only occurs when sampling in cpu-wide mode +and provides additional information on the process +being switched to/from. +The +.B PERF_RECORD_MISC_SWITCH_OUT +bit in the +.I misc +field indicates whether it was a context switch into +or away from the current process. + +.in +4n +.nf +struct { + struct perf_event_header header; + u32 next_prev_pid; + u32 next_prev_tid; + struct sample_id sample_id; +}; +.fi +.RS +.TP +.I next_prev_pid +The process id of the previous (if switching in) +or next (if switching out) process on the CPU. +.TP +.I next_prev_tid +The thread id of the previous (if switching in) +or next (if switching out) thread on the CPU. +.RE .RE .SS Overflow handling Events can be set to notify when a threshold is crossed,