Linux 4.3 introduced two new record types for recording context
switches: PERF_RECORD_SWITCH and PERF_RECORD_SWITCH_CPU_WIDE.
The advantage over the existing tracepoint and software context
switch events is primarily that full switch in/out data can be
gathered even in the face of restrictive perf_event_paranoid
settings.
Signed-off-by: Vince Weaver <[email protected]>
diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
index 68b99bb..04a0cf5 100644
--- a/man2/perf_event_open.2
+++ b/man2/perf_event_open.2
@@ -243,8 +243,9 @@ struct perf_event_attr {
comm_exec : 1, /* flag comm events that are
due to exec */
use_clockid : 1, /* use clockid for time fields */
+ context_switch : 1, /* context switch data */
- __reserved_1 : 38;
+ __reserved_1 : 37;
union {
__u32 wakeup_events; /* wakeup every n events */
@@ -1112,6 +1113,21 @@ field.
This can make it easier to correlate perf sample times with
timestamps generated by other tools.
.TP
+.IR "context_switch" " (since Linux 4.3)"
+.\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
+This enables the generation of
+.B PERF_RECORD_SWITCH
+records when a context switch occurs.
+It also enables the generation of
+.B PERF_RECORD_SWITCH_CPU_WIDE
+records when sampling in cpu-wide mode.
+This functionality is in addition to existing tracepoint and
+software events for measuring context switches.
+The advantage of this method is that it will give full
+information event with strict
+.I perf_event_paranoid
+settings.
+.TP
.IR "wakeup_events" ", " "wakeup_watermark"
This union sets how many samples
.RI ( wakeup_events )
@@ -1792,7 +1808,8 @@ Sample happened in guest user code.
.RE
.RS
-In addition, one of the following bits can be set:
+The following three statuses are generated by
+different record types so they alias to the same bit:
.TP
.BR PERF_RECORD_MISC_MMAP_DATA " (since Linux 3.10)"
.\" commit 2fe85427e3bf65d791700d065132772fc26e4d75
@@ -1807,9 +1824,18 @@ record on kernels more recent than Linux 3.16
if a process name change was caused by an
.BR exec (2)
system call.
-It is an alias for
-.B PERF_RECORD_MISC_MMAP_DATA
-since the two values would not be set in the same record.
+.TP
+.BR PERF_RECORD_MISC_SWITCH_OUT " (since Linux 4.3)"
+.\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
+When a
+.BR PERF_RECORD_SWITCH " or " PERF_RECORD_SWITCH_CPU_WIDE
+record is generated this bit indicates that the
+context switch is away from the current process
+(instead of in to the current process).
+.RE
+
+.RS
+In addition, the following bits can be set:
.TP
.B PERF_RECORD_MISC_EXACT_IP
This indicates that the content of
@@ -2583,6 +2609,59 @@ struct {
.I lost
the number of potentially lost samples.
.RE
+.TP
+.BR PERF_RECORD_SWITCH " (since Linux 4.3)"
+\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
+This record indicates a context switch has happened.
+The
+.B PERF_RECORD_MISC_SWITCH_OUT
+bit in the
+.I misc
+field indicates whether it was a context switch into
+or away from the current process.
+
+.in +4n
+.nf
+struct {
+ struct perf_event_header header;
+ struct sample_id sample_id;
+};
+.fi
+.TP
+.BR PERF_RECORD_SWITCH_CPU_WIDE " (since Linux 4.3)"
+\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
+As with
+.B PERF_RECORD_SWITCH
+this record indicates a context switch has happened,
+but it only occurs when sampling in cpu-wide mode
+and provides additional information on the process
+being switched to/from.
+The
+.B PERF_RECORD_MISC_SWITCH_OUT
+bit in the
+.I misc
+field indicates whether it was a context switch into
+or away from the current process.
+
+.in +4n
+.nf
+struct {
+ struct perf_event_header header;
+ u32 next_prev_pid;
+ u32 next_prev_tid;
+ struct sample_id sample_id;
+};
+.fi
+.RS
+.TP
+.I next_prev_pid
+The process id of the previous (if switching in)
+or next (if switching out) process on the CPU.
+.TP
+.I next_prev_tid
+The thread id of the previous (if switching in)
+or next (if switching out) thread on the CPU.
+.RE
.RE
.SS Overflow handling
Events can be set to notify when a threshold is crossed,
Hi Vince,
On 10/18/2016 07:22 PM, Vince Weaver wrote:
>
> Linux 4.3 introduced two new record types for recording context
> switches: PERF_RECORD_SWITCH and PERF_RECORD_SWITCH_CPU_WIDE.
>
> The advantage over the existing tracepoint and software context
> switch events is primarily that full switch in/out data can be
> gathered even in the face of restrictive perf_event_paranoid
> settings.
>
> Signed-off-by: Vince Weaver <[email protected]>
Thanks! Applied. One query below.
> diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
> index 68b99bb..04a0cf5 100644
> --- a/man2/perf_event_open.2
> +++ b/man2/perf_event_open.2
> @@ -243,8 +243,9 @@ struct perf_event_attr {
> comm_exec : 1, /* flag comm events that are
> due to exec */
> use_clockid : 1, /* use clockid for time fields */
> + context_switch : 1, /* context switch data */
>
> - __reserved_1 : 38;
> + __reserved_1 : 37;
>
> union {
> __u32 wakeup_events; /* wakeup every n events */
> @@ -1112,6 +1113,21 @@ field.
> This can make it easier to correlate perf sample times with
> timestamps generated by other tools.
> .TP
> +.IR "context_switch" " (since Linux 4.3)"
> +.\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
> +This enables the generation of
> +.B PERF_RECORD_SWITCH
> +records when a context switch occurs.
> +It also enables the generation of
> +.B PERF_RECORD_SWITCH_CPU_WIDE
> +records when sampling in cpu-wide mode.
> +This functionality is in addition to existing tracepoint and
> +software events for measuring context switches.
> +The advantage of this method is that it will give full
s/give full/give a full/
ok?
> +information event with strict
> +.I perf_event_paranoid
> +settings.
> +.TP
> .IR "wakeup_events" ", " "wakeup_watermark"
> This union sets how many samples
> .RI ( wakeup_events )
> @@ -1792,7 +1808,8 @@ Sample happened in guest user code.
> .RE
>
> .RS
> -In addition, one of the following bits can be set:
> +The following three statuses are generated by
> +different record types so they alias to the same bit:
> .TP
> .BR PERF_RECORD_MISC_MMAP_DATA " (since Linux 3.10)"
> .\" commit 2fe85427e3bf65d791700d065132772fc26e4d75
> @@ -1807,9 +1824,18 @@ record on kernels more recent than Linux 3.16
> if a process name change was caused by an
> .BR exec (2)
> system call.
> -It is an alias for
> -.B PERF_RECORD_MISC_MMAP_DATA
> -since the two values would not be set in the same record.
> +.TP
> +.BR PERF_RECORD_MISC_SWITCH_OUT " (since Linux 4.3)"
> +.\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
> +When a
> +.BR PERF_RECORD_SWITCH " or " PERF_RECORD_SWITCH_CPU_WIDE
> +record is generated this bit indicates that the
> +context switch is away from the current process
> +(instead of in to the current process).
> +.RE
> +
> +.RS
> +In addition, the following bits can be set:
> .TP
> .B PERF_RECORD_MISC_EXACT_IP
> This indicates that the content of
> @@ -2583,6 +2609,59 @@ struct {
> .I lost
> the number of potentially lost samples.
> .RE
> +.TP
> +.BR PERF_RECORD_SWITCH " (since Linux 4.3)"
> +\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
> +This record indicates a context switch has happened.
> +The
> +.B PERF_RECORD_MISC_SWITCH_OUT
> +bit in the
> +.I misc
> +field indicates whether it was a context switch into
> +or away from the current process.
> +
> +.in +4n
> +.nf
> +struct {
> + struct perf_event_header header;
> + struct sample_id sample_id;
> +};
> +.fi
> +.TP
> +.BR PERF_RECORD_SWITCH_CPU_WIDE " (since Linux 4.3)"
> +\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
> +As with
> +.B PERF_RECORD_SWITCH
> +this record indicates a context switch has happened,
> +but it only occurs when sampling in cpu-wide mode
> +and provides additional information on the process
> +being switched to/from.
> +The
> +.B PERF_RECORD_MISC_SWITCH_OUT
> +bit in the
> +.I misc
> +field indicates whether it was a context switch into
> +or away from the current process.
> +
> +.in +4n
> +.nf
> +struct {
> + struct perf_event_header header;
> + u32 next_prev_pid;
> + u32 next_prev_tid;
> + struct sample_id sample_id;
> +};
> +.fi
> +.RS
> +.TP
> +.I next_prev_pid
> +The process id of the previous (if switching in)
> +or next (if switching out) process on the CPU.
> +.TP
> +.I next_prev_tid
> +The thread id of the previous (if switching in)
> +or next (if switching out) thread on the CPU.
> +.RE
> .RE
> .SS Overflow handling
> Events can be set to notify when a threshold is crossed,
>
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
On Wed, 19 Oct 2016, Michael Kerrisk (man-pages) wrote:
> > diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
> > index 68b99bb..04a0cf5 100644
> > +.B PERF_RECORD_SWITCH_CPU_WIDE
> > +records when sampling in cpu-wide mode.
> > +This functionality is in addition to existing tracepoint and
> > +software events for measuring context switches.
> > +The advantage of this method is that it will give full
>
> s/give full/give a full/
>
> ok?
>
> > +information event with strict
> > +.I perf_event_paranoid
> > +settings.
What I meant to say was
"it will give full information *even* with strict perf_event_paranoid
settings"
Maybe saying something like "despite strict settings" would be better
wording.
not sure how I missed that typo, apparently my fingers are used to typing
"event" too much.
Vince
On 10/19/2016 05:14 PM, Vince Weaver wrote:
> On Wed, 19 Oct 2016, Michael Kerrisk (man-pages) wrote:
>
>>> diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
>>> index 68b99bb..04a0cf5 100644
>>> +.B PERF_RECORD_SWITCH_CPU_WIDE
>>> +records when sampling in cpu-wide mode.
>>> +This functionality is in addition to existing tracepoint and
>>> +software events for measuring context switches.
>>> +The advantage of this method is that it will give full
>>
>> s/give full/give a full/
>>
>> ok?
>>
>>> +information event with strict
>>> +.I perf_event_paranoid
>>> +settings.
>
> What I meant to say was
>
> "it will give full information *even* with strict perf_event_paranoid
> settings"
>
> Maybe saying something like "despite strict settings" would be better
> wording.
>
> not sure how I missed that typo, apparently my fingers are used to typing
> "event" too much.
Thanks, Vince. Fixed now.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/