LinuxLists.cc - [RFC PATCH] perf script python: Allow reporting [un]throttle

2021-08-17 00:23:17

Subject: [RFC PATCH] perf script python: Allow reporting [un]throttle

perf_events may sometimes throttle an event due to creating too many
samples during a given timer tick. As of now, the perf tool will not
report on throttling, which means this is a silent error. Implement a
callback for the throttle and unthrottle events within the Python
scripting engine, which can allow scripts to detect and report when
events may have been lost due to throttling.

A python script could simply define throttle() and unthrottle()
functions to begin receiving them, e.g.:

```
from __future__ import print_function

def process_event(param_dict):
print("event cpu={} time={}".format(
param_dict["sample"]["cpu"], param_dict["sample"]["time"]))

def throttle(*args):
print("throttle(time={}, cpu={}, pid={}, tid={})".format(*args))

def unthrottle(*args):
print("unthrottle(time={}, cpu={}, pid={}, tid={})".format(*args))
```

Signed-off-by: Stephen Brennan <[email protected]>
---

Since this mailing list thread[1] I've been wondering about ways to
detect and handle throttling. Perf will warn when events are missed
because the ring buffer filled up, but it will not give any indication
about the throttling -- except for the throttle message logged by the
kernel. Ideally, I'd like to also extend the other perf tools to give
a warning, but detecting it after the fact via a script was easiest to
implement, and most flexible for me to use. I'd appreciate feedback on
this change as well as what such a warning in perf record/report would
look like. For example:

[ perf record: WARNING: samples were throttled %u times for %u seconds ]

[1]: https://lore.kernel.org/linux-perf-users/[email protected]/T/#u

tools/perf/builtin-script.c | 13 ++++++++
.../scripting-engines/trace-event-python.c | 30 +++++++++++++++++++
tools/perf/util/trace-event.h | 3 ++
3 files changed, 46 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 064da7f3618d..072869a35f85 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2492,6 +2492,17 @@ process_lost_event(struct perf_tool *tool,
sample->tid);
}

+static int
+process_throttle_event(struct perf_tool *tool __maybe_unused,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct machine *machine)
+{
+ if (scripting_ops && scripting_ops->process_throttle)
+ scripting_ops->process_throttle(event, sample, machine);
+ return 0;
+}
+
static int
process_finished_round_event(struct perf_tool *tool __maybe_unused,
union perf_event *event,
@@ -3652,6 +3663,8 @@ int cmd_script(int argc, const char **argv)
.stat_config = process_stat_config_event,
.thread_map = process_thread_map_event,
.cpu_map = process_cpu_map_event,
+ .throttle = process_throttle_event,
+ .unthrottle = process_throttle_event,
.ordered_events = true,
.ordering_requires_timestamps = true,
},
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index 69129e2aa7a1..5ef1ba5e29bb 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -1422,6 +1422,35 @@ static void python_process_event(union perf_event *event,
}
}

+static void python_process_throttle(union perf_event *event,
+ struct perf_sample *sample,
+ struct machine *machine)
+{
+ const char *handler_name;
+ PyObject *handler, *t;
+
+ if (event->header.type == PERF_RECORD_THROTTLE)
+ handler_name = "throttle";
+ else
+ handler_name = "unthrottle";
+ handler = get_handler(handler_name);
+ if (!handler)
+ return;
+
+ t = tuple_new(4);
+ if (!t)
+ return;
+
+ tuple_set_u64(t, 0, sample->time);
+ tuple_set_s32(t, 1, sample->cpu);
+ tuple_set_s32(t, 2, sample->pid);
+ tuple_set_s32(t, 3, sample->tid);
+
+ call_object(handler, t, handler_name);
+
+ Py_DECREF(t);
+}
+
static void python_do_process_switch(union perf_event *event,
struct perf_sample *sample,
struct machine *machine)
@@ -2079,5 +2108,6 @@ struct scripting_ops python_scripting_ops = {
.process_auxtrace_error = python_process_auxtrace_error,
.process_stat = python_process_stat,
.process_stat_interval = python_process_stat_interval,
+ .process_throttle = python_process_throttle,
.generate_script = python_generate_script,
};
diff --git a/tools/perf/util/trace-event.h b/tools/perf/util/trace-event.h
index 54aadeedf28c..640981105788 100644
--- a/tools/perf/util/trace-event.h
+++ b/tools/perf/util/trace-event.h
@@ -90,6 +90,9 @@ struct scripting_ops {
void (*process_stat)(struct perf_stat_config *config,
struct evsel *evsel, u64 tstamp);
void (*process_stat_interval)(u64 tstamp);
+ void (*process_throttle)(union perf_event *event,
+ struct perf_sample *sample,
+ struct machine *machine);
int (*generate_script) (struct tep_handle *pevent, const char *outfile);
};

--
2.30.2

2021-08-31 18:48:33

by Jiri Olsa

[permalink] [raw]

Subject: Re: [RFC PATCH] perf script python: Allow reporting [un]throttle

On Mon, Aug 16, 2021 at 05:21:33PM -0700, Stephen Brennan wrote:
> perf_events may sometimes throttle an event due to creating too many
> samples during a given timer tick. As of now, the perf tool will not
> report on throttling, which means this is a silent error. Implement a
> callback for the throttle and unthrottle events within the Python
> scripting engine, which can allow scripts to detect and report when
> events may have been lost due to throttling.
>
> A python script could simply define throttle() and unthrottle()
> functions to begin receiving them, e.g.:
>
> ```
> from __future__ import print_function
>
> def process_event(param_dict):
> print("event cpu={} time={}".format(
> param_dict["sample"]["cpu"], param_dict["sample"]["time"]))
>
> def throttle(*args):
> print("throttle(time={}, cpu={}, pid={}, tid={})".format(*args))
>
> def unthrottle(*args):
> print("unthrottle(time={}, cpu={}, pid={}, tid={})".format(*args))
> ```

throttle event has also 'id' and 'stream_id' I guess you don't
need it, but maybe we should add it to be complete

otherwose looks good

jirka

>
> Signed-off-by: Stephen Brennan <[email protected]>
> ---
>
> Since this mailing list thread[1] I've been wondering about ways to
> detect and handle throttling. Perf will warn when events are missed
> because the ring buffer filled up, but it will not give any indication
> about the throttling -- except for the throttle message logged by the
> kernel. Ideally, I'd like to also extend the other perf tools to give
> a warning, but detecting it after the fact via a script was easiest to
> implement, and most flexible for me to use. I'd appreciate feedback on
> this change as well as what such a warning in perf record/report would
> look like. For example:
>
> [ perf record: WARNING: samples were throttled %u times for %u seconds ]
>
> [1]: https://lore.kernel.org/linux-perf-users/[email protected]/T/#u
>
> tools/perf/builtin-script.c | 13 ++++++++
> .../scripting-engines/trace-event-python.c | 30 +++++++++++++++++++
> tools/perf/util/trace-event.h | 3 ++
> 3 files changed, 46 insertions(+)
>
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index 064da7f3618d..072869a35f85 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -2492,6 +2492,17 @@ process_lost_event(struct perf_tool *tool,
> sample->tid);
> }
>
> +static int
> +process_throttle_event(struct perf_tool *tool __maybe_unused,
> + union perf_event *event,
> + struct perf_sample *sample,
> + struct machine *machine)
> +{
> + if (scripting_ops && scripting_ops->process_throttle)
> + scripting_ops->process_throttle(event, sample, machine);
> + return 0;
> +}
> +
> static int
> process_finished_round_event(struct perf_tool *tool __maybe_unused,
> union perf_event *event,
> @@ -3652,6 +3663,8 @@ int cmd_script(int argc, const char **argv)
> .stat_config = process_stat_config_event,
> .thread_map = process_thread_map_event,
> .cpu_map = process_cpu_map_event,
> + .throttle = process_throttle_event,
> + .unthrottle = process_throttle_event,
> .ordered_events = true,
> .ordering_requires_timestamps = true,
> },
> diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
> index 69129e2aa7a1..5ef1ba5e29bb 100644
> --- a/tools/perf/util/scripting-engines/trace-event-python.c
> +++ b/tools/perf/util/scripting-engines/trace-event-python.c
> @@ -1422,6 +1422,35 @@ static void python_process_event(union perf_event *event,
> }
> }
>
> +static void python_process_throttle(union perf_event *event,
> + struct perf_sample *sample,
> + struct machine *machine)
> +{
> + const char *handler_name;
> + PyObject *handler, *t;
> +
> + if (event->header.type == PERF_RECORD_THROTTLE)
> + handler_name = "throttle";
> + else
> + handler_name = "unthrottle";
> + handler = get_handler(handler_name);
> + if (!handler)
> + return;
> +
> + t = tuple_new(4);
> + if (!t)
> + return;
> +
> + tuple_set_u64(t, 0, sample->time);
> + tuple_set_s32(t, 1, sample->cpu);
> + tuple_set_s32(t, 2, sample->pid);
> + tuple_set_s32(t, 3, sample->tid);
> +
> + call_object(handler, t, handler_name);
> +
> + Py_DECREF(t);
> +}
> +
> static void python_do_process_switch(union perf_event *event,
> struct perf_sample *sample,
> struct machine *machine)
> @@ -2079,5 +2108,6 @@ struct scripting_ops python_scripting_ops = {
> .process_auxtrace_error = python_process_auxtrace_error,
> .process_stat = python_process_stat,
> .process_stat_interval = python_process_stat_interval,
> + .process_throttle = python_process_throttle,
> .generate_script = python_generate_script,
> };
> diff --git a/tools/perf/util/trace-event.h b/tools/perf/util/trace-event.h
> index 54aadeedf28c..640981105788 100644
> --- a/tools/perf/util/trace-event.h
> +++ b/tools/perf/util/trace-event.h
> @@ -90,6 +90,9 @@ struct scripting_ops {
> void (*process_stat)(struct perf_stat_config *config,
> struct evsel *evsel, u64 tstamp);
> void (*process_stat_interval)(u64 tstamp);
> + void (*process_throttle)(union perf_event *event,
> + struct perf_sample *sample,
> + struct machine *machine);
> int (*generate_script) (struct tep_handle *pevent, const char *outfile);
> };
>
> --
> 2.30.2
>

2021-08-31 21:53:59

by Stephen Brennan

[permalink] [raw]

Subject: Re: [RFC PATCH] perf script python: Allow reporting [un]throttle

Hi Jiri,

On 8/31/21 11:46 AM, Jiri Olsa wrote:
> On Mon, Aug 16, 2021 at 05:21:33PM -0700, Stephen Brennan wrote:
>> perf_events may sometimes throttle an event due to creating too many
>> samples during a given timer tick. As of now, the perf tool will not
>> report on throttling, which means this is a silent error. Implement a
>> callback for the throttle and unthrottle events within the Python
>> scripting engine, which can allow scripts to detect and report when
>> events may have been lost due to throttling.
>>
>> A python script could simply define throttle() and unthrottle()
>> functions to begin receiving them, e.g.:
>>
>> ```
>> from __future__ import print_function
>>
>> def process_event(param_dict):
>> print("event cpu={} time={}".format(
>> param_dict["sample"]["cpu"], param_dict["sample"]["time"]))
>>
>> def throttle(*args):
>> print("throttle(time={}, cpu={}, pid={}, tid={})".format(*args))
>>
>> def unthrottle(*args):
>> print("unthrottle(time={}, cpu={}, pid={}, tid={})".format(*args))
>> ```
>
> throttle event has also 'id' and 'stream_id' I guess you don't
> need it, but maybe we should add it to be complete

I tried adding these from struct perf_sample.id and struct
perf_sample.sample_id respectively. I then tested these on a perf.data
which contains a throttle event. The values which my python script
reported were 0xFFFFFFFFFFFFFFFF -- is this expected? If so, I'll send
my revised patch.

Thanks,
Stephen

`perf report --header-only` from my testing perf.data:

# ========
# captured on : Wed Aug 11 11:55:44 2021
# header version : 1
# data offset : 264
# data size : 18240
# feat offset : 18504
# hostname : stepbren-ol7-2
# os release : 5.4.17-2102.203.6.el7uek.x86_64
# perf version : 5.4.17-2102.203.6.el7uek.x86_64
# arch : x86_64
# nrcpus online : 4
# nrcpus avail : 4
# cpudesc : Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz
# cpuid : GenuineIntel,6,85,4
# total memory : 30522624 kB
# cmdline : /usr/libexec/perf.5.4.17-2102.203.6.el7uek.x86_64 record -c
100000 -e cycles -- sh -c echo 2000 > /proc/irqoff_sleep_millis
# event : name = cycles, , id = { 5, 6, 7, 8 }, size = 112, {
sample_period, sample_freq } = 100000, sample_type = IP|TID|TIME,
read_format = ID, disabled = 1, inherit = 1, mmap = 1, comm = 1,
enable_on_exec = 1, task = 1, sample_id_all = 1, exclude_guest = 1,
mmap2 = 1, comm_exec = 1, ksymbol = 1, bpf_event = 1
# CPU_TOPOLOGY info available, use -I to display
# NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: software = 1, uprobe = 7, cpu = 4, breakpoint = 5,
tracepoint = 2, kprobe = 6, msr = 8
# CACHE info available, use -I to display
# time of first sample : 228.321751
# time of last sample : 230.362698
# sample duration : 2040.947 ms
# MEM_TOPOLOGY info available, use -I to display
# missing features: TRACING_DATA BRANCH_STACK GROUP_DESC AUXTRACE STAT
CLOCKID DIR_FORMAT COMPRESSED CPU_PMU_CAPS CLOCK_DATA HYBRID_TOPOLOGY
HYBRID_CPU_PMU_CAPS
# ========
#

2021-09-01 13:39:02

by Arnaldo Carvalho de Melo

[permalink] [raw]

Subject: Re: [RFC PATCH] perf script python: Allow reporting [un]throttle

Em Tue, Aug 31, 2021 at 02:47:47PM -0700, Stephen Brennan escreveu:
> On 8/31/21 11:46 AM, Jiri Olsa wrote:
> > On Mon, Aug 16, 2021 at 05:21:33PM -0700, Stephen Brennan wrote:
> > > perf_events may sometimes throttle an event due to creating too many
> > > samples during a given timer tick. As of now, the perf tool will not
> > > report on throttling, which means this is a silent error. Implement a
> > > callback for the throttle and unthrottle events within the Python
> > > scripting engine, which can allow scripts to detect and report when
> > > events may have been lost due to throttling.

> > > A python script could simply define throttle() and unthrottle()
> > > functions to begin receiving them, e.g.:

> > > ```
> > > from __future__ import print_function

> > > def process_event(param_dict):
> > > print("event cpu={} time={}".format(
> > > param_dict["sample"]["cpu"], param_dict["sample"]["time"]))

> > > def throttle(*args):
> > > print("throttle(time={}, cpu={}, pid={}, tid={})".format(*args))

> > > def unthrottle(*args):
> > > print("unthrottle(time={}, cpu={}, pid={}, tid={})".format(*args))
> > > ```

> > throttle event has also 'id' and 'stream_id' I guess you don't
> > need it, but maybe we should add it to be complete

> I tried adding these from struct perf_sample.id and struct
> perf_sample.sample_id respectively. I then tested these on a perf.data which
> contains a throttle event. The values which my python script reported were
> 0xFFFFFFFFFFFFFFFF -- is this expected? If so, I'll send my revised patch.

So, perf_sample.X is only defined if PERF_SAMPLE_X is in
perf_event_attr.sample_type, and for the perf.data file below it is:

sample_type = IP|TID|TIME

So only perf_sample.ip, perf_sample.tid and perf_sample.time are set,
not perf_sample.id nor perf_sample.stream_id.

What Jiri suggests is to use the _fixed_ payload for
PERF_SAMPLE_THROTTLE/UNTHROTTLE, which is, according to
include/uapi/linux/perf_event.h:

* struct {
* struct perf_event_header header;
* u64 time; <<<<
* u64 id; <<<<
* u64 stream_id; <<<<
* struct sample_id sample_id;
* };
*/
PERF_RECORD_THROTTLE = 5,
PERF_RECORD_UNTHROTTLE = 6,

The 'time', 'id' and 'stream_id' should always be there, while what is
in 'sample_id' is documented in the same file at the beginning of the
'enum perf_event_type' definition:

* If perf_event_attr.sample_id_all is set then all event types will
* have the sample_type selected fields related to where/when
* (identity) an event took place (TID, TIME, ID, STREAM_ID, CPU,
* IDENTIFIER) described in PERF_RECORD_SAMPLE below, it will be stashed
* just after the perf_event_header and the fields already present for
* the existing fields, i.e. at the end of the payload. That way a newer
* perf.data file will be supported by older perf tools, with these new
* optional fields being ignored.
*
* struct sample_id {
* { u32 pid, tid; } && PERF_SAMPLE_TID
* { u64 time; } && PERF_SAMPLE_TIME
* { u64 id; } && PERF_SAMPLE_ID
* { u64 stream_id;} && PERF_SAMPLE_STREAM_ID
* { u32 cpu, res; } && PERF_SAMPLE_CPU
* { u64 id; } && PERF_SAMPLE_IDENTIFIER
* } && perf_event_attr::sample_id_all
*
* Note that PERF_SAMPLE_IDENTIFIER duplicates PERF_SAMPLE_ID. The
* advantage of PERF_SAMPLE_IDENTIFIER is that its position is fixed
* relative to header.size.

There are artifacts here from the evolution of the perf_event subsystem,
i.e. throttle.id is equal to throttle.sample-id.id when PERF_SAMPLE_ID
is set in perf_event_attr.sample_type, ditto for 'time' and 'stream_id'.
entries.

But since 'time', 'id' and 'stream_id' are _always_ set in the
PERF_RECORD_THROTTLE event, we can as well make them available via perf
script python.

Take a look at ./kernel/events/core.c, function perf_log_throttle(), the
only one in the kernel that emits the PERF_RECORD_THROTTLE meta event
and see that it always set the 'time', 'id' and 'stream_id' fields:

static void perf_log_throttle(struct perf_event *event, int enable)
{
struct perf_output_handle handle;
struct perf_sample_data sample;
int ret;

struct {
struct perf_event_header header;
u64 time;
u64 id;
u64 stream_id;
} throttle_event = {
.header = {
.type = PERF_RECORD_THROTTLE,
.misc = 0,
.size = sizeof(throttle_event),
},
.time = perf_event_clock(event),
.id = primary_event_id(event),
.stream_id = event->id,
};

if (enable)
throttle_event.header.type = PERF_RECORD_UNTHROTTLE;

perf_event_header__init_id(&throttle_event.header, &sample, event);

ret = perf_output_begin(&handle, &sample, event,
throttle_event.header.size);
if (ret)
return;

perf_output_put(&handle, throttle_event);
perf_event__output_id_sample(event, &handle, &sample);
perf_output_end(&handle);
}

So instead of doing:

+ tuple_set_u64(t, 0, sample->time);
+ tuple_set_s32(t, 1, sample->cpu);
+ tuple_set_s32(t, 2, sample->pid);
+ tuple_set_s32(t, 3, sample->tid);

That may not be set at all, you should do:

+ tuple_set_u64(t, 0, event->throttle.time);
+ tuple_set_u64(t, 1, event->throttle.id);
+ tuple_set_u64(t, 2, event->throttle.stream_id);

These can be left as is with 0xFFFFFFFFFFFFFFFF meaning its not set:

+ tuple_set_s32(t, 3, sample->cpu);
+ tuple_set_s32(t, 4, sample->pid);
+ tuple_set_s32(t, 5, sample->tid);

to confirm the user can look at perf_event_attr.sample_type |
PERF_SAMPLE_X.

- Arnaldo

> Thanks,
> Stephen
>
> `perf report --header-only` from my testing perf.data:
>
> # ========
> # captured on : Wed Aug 11 11:55:44 2021
> # header version : 1
> # data offset : 264
> # data size : 18240
> # feat offset : 18504
> # hostname : stepbren-ol7-2
> # os release : 5.4.17-2102.203.6.el7uek.x86_64
> # perf version : 5.4.17-2102.203.6.el7uek.x86_64
> # arch : x86_64
> # nrcpus online : 4
> # nrcpus avail : 4
> # cpudesc : Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz
> # cpuid : GenuineIntel,6,85,4
> # total memory : 30522624 kB
> # cmdline : /usr/libexec/perf.5.4.17-2102.203.6.el7uek.x86_64 record -c
> 100000 -e cycles -- sh -c echo 2000 > /proc/irqoff_sleep_millis
> # event : name = cycles, , id = { 5, 6, 7, 8 }, size = 112, { sample_period,
> sample_freq } = 100000, sample_type = IP|TID|TIME, read_format = ID,
> disabled = 1, inherit = 1, mmap = 1, comm = 1, enable_on_exec = 1, task = 1,
> sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, ksymbol = 1,
> bpf_event = 1
> # CPU_TOPOLOGY info available, use -I to display
> # NUMA_TOPOLOGY info available, use -I to display
> # pmu mappings: software = 1, uprobe = 7, cpu = 4, breakpoint = 5,
> tracepoint = 2, kprobe = 6, msr = 8
> # CACHE info available, use -I to display
> # time of first sample : 228.321751
> # time of last sample : 230.362698
> # sample duration : 2040.947 ms
> # MEM_TOPOLOGY info available, use -I to display
> # missing features: TRACING_DATA BRANCH_STACK GROUP_DESC AUXTRACE STAT
> CLOCKID DIR_FORMAT COMPRESSED CPU_PMU_CAPS CLOCK_DATA HYBRID_TOPOLOGY
> HYBRID_CPU_PMU_CAPS
> # ========

2021-09-01 20:49:44

by Stephen Brennan

[permalink] [raw]

Subject: Re: [RFC PATCH] perf script python: Allow reporting [un]throttle

Arnaldo Carvalho de Melo <[email protected]> writes:
> Em Tue, Aug 31, 2021 at 02:47:47PM -0700, Stephen Brennan escreveu:
>> On 8/31/21 11:46 AM, Jiri Olsa wrote:
>> > On Mon, Aug 16, 2021 at 05:21:33PM -0700, Stephen Brennan wrote:
>> > > perf_events may sometimes throttle an event due to creating too many
>> > > samples during a given timer tick. As of now, the perf tool will not
>> > > report on throttling, which means this is a silent error. Implement a
>> > > callback for the throttle and unthrottle events within the Python
>> > > scripting engine, which can allow scripts to detect and report when
>> > > events may have been lost due to throttling.
>
>> > > A python script could simply define throttle() and unthrottle()
>> > > functions to begin receiving them, e.g.:
>
>> > > ```
>> > > from __future__ import print_function
>
>> > > def process_event(param_dict):
>> > > print("event cpu={} time={}".format(
>> > > param_dict["sample"]["cpu"], param_dict["sample"]["time"]))
>
>> > > def throttle(*args):
>> > > print("throttle(time={}, cpu={}, pid={}, tid={})".format(*args))
>
>> > > def unthrottle(*args):
>> > > print("unthrottle(time={}, cpu={}, pid={}, tid={})".format(*args))
>> > > ```
>
>> > throttle event has also 'id' and 'stream_id' I guess you don't
>> > need it, but maybe we should add it to be complete
>
>> I tried adding these from struct perf_sample.id and struct
>> perf_sample.sample_id respectively. I then tested these on a perf.data which
>> contains a throttle event. The values which my python script reported were
>> 0xFFFFFFFFFFFFFFFF -- is this expected? If so, I'll send my revised patch.
>
> So, perf_sample.X is only defined if PERF_SAMPLE_X is in
> perf_event_attr.sample_type, and for the perf.data file below it is:
>
> sample_type = IP|TID|TIME
>
> So only perf_sample.ip, perf_sample.tid and perf_sample.time are set,
> not perf_sample.id nor perf_sample.stream_id.
>
> What Jiri suggests is to use the _fixed_ payload for
> PERF_SAMPLE_THROTTLE/UNTHROTTLE, which is, according to
> include/uapi/linux/perf_event.h:
>
> * struct {
> * struct perf_event_header header;
> * u64 time; <<<<
> * u64 id; <<<<
> * u64 stream_id; <<<<
> * struct sample_id sample_id;
> * };
> */
> PERF_RECORD_THROTTLE = 5,
> PERF_RECORD_UNTHROTTLE = 6,
>
>
> The 'time', 'id' and 'stream_id' should always be there, while what is
> in 'sample_id' is documented in the same file at the beginning of the
> 'enum perf_event_type' definition:
>
> * If perf_event_attr.sample_id_all is set then all event types will
> * have the sample_type selected fields related to where/when
> * (identity) an event took place (TID, TIME, ID, STREAM_ID, CPU,
> * IDENTIFIER) described in PERF_RECORD_SAMPLE below, it will be stashed
> * just after the perf_event_header and the fields already present for
> * the existing fields, i.e. at the end of the payload. That way a newer
> * perf.data file will be supported by older perf tools, with these new
> * optional fields being ignored.
> *
> * struct sample_id {
> * { u32 pid, tid; } && PERF_SAMPLE_TID
> * { u64 time; } && PERF_SAMPLE_TIME
> * { u64 id; } && PERF_SAMPLE_ID
> * { u64 stream_id;} && PERF_SAMPLE_STREAM_ID
> * { u32 cpu, res; } && PERF_SAMPLE_CPU
> * { u64 id; } && PERF_SAMPLE_IDENTIFIER
> * } && perf_event_attr::sample_id_all
> *
> * Note that PERF_SAMPLE_IDENTIFIER duplicates PERF_SAMPLE_ID. The
> * advantage of PERF_SAMPLE_IDENTIFIER is that its position is fixed
> * relative to header.size.
>
> There are artifacts here from the evolution of the perf_event subsystem,
> i.e. throttle.id is equal to throttle.sample-id.id when PERF_SAMPLE_ID
> is set in perf_event_attr.sample_type, ditto for 'time' and 'stream_id'.
> entries.
>
> But since 'time', 'id' and 'stream_id' are _always_ set in the
> PERF_RECORD_THROTTLE event, we can as well make them available via perf
> script python.
>
> Take a look at ./kernel/events/core.c, function perf_log_throttle(), the
> only one in the kernel that emits the PERF_RECORD_THROTTLE meta event
> and see that it always set the 'time', 'id' and 'stream_id' fields:
>
> static void perf_log_throttle(struct perf_event *event, int enable)
> {
> struct perf_output_handle handle;
> struct perf_sample_data sample;
> int ret;
>
> struct {
> struct perf_event_header header;
> u64 time;
> u64 id;
> u64 stream_id;
> } throttle_event = {
> .header = {
> .type = PERF_RECORD_THROTTLE,
> .misc = 0,
> .size = sizeof(throttle_event),
> },
> .time = perf_event_clock(event),
> .id = primary_event_id(event),
> .stream_id = event->id,
> };
>
> if (enable)
> throttle_event.header.type = PERF_RECORD_UNTHROTTLE;
>
> perf_event_header__init_id(&throttle_event.header, &sample, event);
>
> ret = perf_output_begin(&handle, &sample, event,
> throttle_event.header.size);
> if (ret)
> return;
>
> perf_output_put(&handle, throttle_event);
> perf_event__output_id_sample(event, &handle, &sample);
> perf_output_end(&handle);
> }
>
> So instead of doing:
>
> + tuple_set_u64(t, 0, sample->time);
> + tuple_set_s32(t, 1, sample->cpu);
> + tuple_set_s32(t, 2, sample->pid);
> + tuple_set_s32(t, 3, sample->tid);
>
> That may not be set at all, you should do:
>
> + tuple_set_u64(t, 0, event->throttle.time);
> + tuple_set_u64(t, 1, event->throttle.id);
> + tuple_set_u64(t, 2, event->throttle.stream_id);
>
> These can be left as is with 0xFFFFFFFFFFFFFFFF meaning its not set:
>
> + tuple_set_s32(t, 3, sample->cpu);
> + tuple_set_s32(t, 4, sample->pid);
> + tuple_set_s32(t, 5, sample->tid);
>
> to confirm the user can look at perf_event_attr.sample_type |
> PERF_SAMPLE_X.

Thank you so much for this detailed explanation! I'll send v2 as you
described. This definitely helps me understand the data structures at
play much more.

Thanks,
Stephen

>
> - Arnaldo
>
>> Thanks,
>> Stephen
>>
>> `perf report --header-only` from my testing perf.data:
>>
>> # ========
>> # captured on : Wed Aug 11 11:55:44 2021
>> # header version : 1
>> # data offset : 264
>> # data size : 18240
>> # feat offset : 18504
>> # hostname : stepbren-ol7-2
>> # os release : 5.4.17-2102.203.6.el7uek.x86_64
>> # perf version : 5.4.17-2102.203.6.el7uek.x86_64
>> # arch : x86_64
>> # nrcpus online : 4
>> # nrcpus avail : 4
>> # cpudesc : Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz
>> # cpuid : GenuineIntel,6,85,4
>> # total memory : 30522624 kB
>> # cmdline : /usr/libexec/perf.5.4.17-2102.203.6.el7uek.x86_64 record -c
>> 100000 -e cycles -- sh -c echo 2000 > /proc/irqoff_sleep_millis
>> # event : name = cycles, , id = { 5, 6, 7, 8 }, size = 112, { sample_period,
>> sample_freq } = 100000, sample_type = IP|TID|TIME, read_format = ID,
>> disabled = 1, inherit = 1, mmap = 1, comm = 1, enable_on_exec = 1, task = 1,
>> sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, ksymbol = 1,
>> bpf_event = 1
>> # CPU_TOPOLOGY info available, use -I to display
>> # NUMA_TOPOLOGY info available, use -I to display
>> # pmu mappings: software = 1, uprobe = 7, cpu = 4, breakpoint = 5,
>> tracepoint = 2, kprobe = 6, msr = 8
>> # CACHE info available, use -I to display
>> # time of first sample : 228.321751
>> # time of last sample : 230.362698
>> # sample duration : 2040.947 ms
>> # MEM_TOPOLOGY info available, use -I to display
>> # missing features: TRACING_DATA BRANCH_STACK GROUP_DESC AUXTRACE STAT
>> CLOCKID DIR_FORMAT COMPRESSED CPU_PMU_CAPS CLOCK_DATA HYBRID_TOPOLOGY
>> HYBRID_CPU_PMU_CAPS
>> # ========