2024-03-22 16:43:06

by Ben Gainey

[permalink] [raw]
Subject: [PATCH v4 0/4] perf: Support PERF_SAMPLE_READ with inherit_stat

This change allows events to use PERF_SAMPLE READ with inherit so long
as both inherit_stat and PERF_SAMPLE_TID are set.

Currently it is not possible to use PERF_SAMPLE_READ with inherit. This
restriction assumes the user is interested in collecting aggregate
statistics as per `perf stat`. It prevents a user from collecting
per-thread samples using counter groups from a multi-threaded or
multi-process application, as with `perf record -e '{....}:S'`. Instead
users must use system-wide mode, or forgo the ability to sample counter
groups. System-wide mode is often problematic as it requires specific
permissions (no CAP_PERFMON / root access), or may lead to capture of
significant amounts of extra data from other processes running on the
system.

Perf already supports the ability to collect per-thread counts with
`inherit` via the `inherit_stat` flag. This patch changes
`perf_event_alloc` relaxing the restriction to combine `inherit` with
`PERF_SAMPLE_READ` so that the combination will be allowed so long as
`inherit_stat` and `PERF_SAMPLE_TID` are enabled.

In this configuration stream ids (such as may appear in the read_format
field of a PERF_RECORD_SAMPLE) are no longer globally unique, rather
the pair of (stream id, tid) uniquely identify each event. Tools that
rely on this, for example to calculate a delta between samples, would
need updating to take this into account. Previously valid event
configurations (system-wide, no-inherit and so on) where each stream id
is the identifier are unaffected.


Changes since v3:
- Cleaned up perf test data changes incorrectly included into this
series from elsewhere.

Changes since v2:
- Rebase on v6.8
- Respond to James Clarke's feedback; fixup some typos and move some
repeated checks into a helper macro.
- Cleaned up checkpatch lints.
- Updated perf test; fixed evsel handling so that existing tests pass
and added new tests to cover the new behaviour.

Changes since v1:
- Rebase on v6.8-rc1
- Fixed value written into sample after child exists.
- Modified handling of switch-out so that context with these events
take the slow path, so that the per-event/per-thread PMU state is
correctly switched.
- Modified perf tools to support this mode of operation.

Ben Gainey (4):
perf: Support PERF_SAMPLE_READ with inherit_stat
tools/perf: Track where perf_sample_ids need per-thread periods
tools/perf: Correctly calculate sample period for inherited
SAMPLE_READ values
tools/perf: Allow inherit + inherit_stat + PERF_SAMPLE_READ when
opening events

include/linux/perf_event.h | 1 +
kernel/events/core.c | 62 ++++++++++----
tools/lib/perf/evlist.c | 1 +
tools/lib/perf/evsel.c | 48 +++++++++++
tools/lib/perf/include/internal/evsel.h | 55 ++++++++++++-
.../test-record-group-sampling-inherit-stat | 62 ++++++++++++++
tools/perf/util/evsel.c | 82 ++++++++++++++++++-
tools/perf/util/evsel.h | 1 +
tools/perf/util/session.c | 11 ++-
9 files changed, 301 insertions(+), 22 deletions(-)
create mode 100644 tools/perf/tests/attr/test-record-group-sampling-inherit-stat

--
2.44.0



2024-03-22 16:43:13

by Ben Gainey

[permalink] [raw]
Subject: [PATCH v4 1/4] perf: Support PERF_SAMPLE_READ with inherit_stat

This change allows events to use PERF_SAMPLE READ with inherit
so long as both inherit_stat and PERF_SAMPLE_TID are set.

In this configuration, an event will be inherited into any
child processes / threads, allowing convenient profiling of a
multiprocess or multithreaded application, whilst allowing
profiling tools to collect per-thread samples, in particular
of groups of counters.

Signed-off-by: Ben Gainey <[email protected]>
---
include/linux/perf_event.h | 1 +
kernel/events/core.c | 62 +++++++++++++++++++++++++++++---------
2 files changed, 48 insertions(+), 15 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index d2a15c0c6f8a..7d405dff6694 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -932,6 +932,7 @@ struct perf_event_context {

int nr_task_data;
int nr_stat;
+ int nr_stat_read;
int nr_freq;
int rotate_disable;

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f0f0f71213a1..870a7dbd8d5f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1767,6 +1767,18 @@ perf_event_groups_next(struct perf_event *event, struct pmu *pmu)
event = rb_entry_safe(rb_next(&event->group_node), \
typeof(*event), group_node))

+/*
+ * Does the event attribute request inherit with PERF_SAMPLE_READ
+ */
+#define perf_attr_has_inherit_and_sample_read(attr) \
+ ((attr)->inherit && ((attr)->sample_type & PERF_SAMPLE_READ))
+
+/*
+ * Does the event request an attribte that requests inherit with PERF_SAMPLE_READ
+ */
+#define perf_event_has_inherit_and_sample_read(event) \
+ perf_attr_has_inherit_and_sample_read(&((event)->attr))
+
/*
* Add an event from the lists for its context.
* Must be called with ctx->mutex and ctx->lock held.
@@ -1795,8 +1807,11 @@ list_add_event(struct perf_event *event, struct perf_event_context *ctx)
ctx->nr_events++;
if (event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT)
ctx->nr_user++;
- if (event->attr.inherit_stat)
+ if (event->attr.inherit_stat) {
ctx->nr_stat++;
+ if (perf_event_has_inherit_and_sample_read(event))
+ ctx->nr_stat_read++;
+ }

if (event->state > PERF_EVENT_STATE_OFF)
perf_cgroup_event_enable(event, ctx);
@@ -2019,8 +2034,11 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx)
ctx->nr_events--;
if (event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT)
ctx->nr_user--;
- if (event->attr.inherit_stat)
+ if (event->attr.inherit_stat) {
ctx->nr_stat--;
+ if (perf_event_has_inherit_and_sample_read(event))
+ ctx->nr_stat_read--;
+ }

list_del_rcu(&event->event_entry);

@@ -3529,11 +3547,19 @@ perf_event_context_sched_out(struct task_struct *task, struct task_struct *next)
perf_ctx_disable(ctx, false);

/* PMIs are disabled; ctx->nr_pending is stable. */
- if (local_read(&ctx->nr_pending) ||
+ if (ctx->nr_stat_read ||
+ next_ctx->nr_stat_read ||
+ local_read(&ctx->nr_pending) ||
local_read(&next_ctx->nr_pending)) {
/*
* Must not swap out ctx when there's pending
* events that rely on the ctx->task relation.
+ *
+ * Likewise, when a context contains inherit +
+ * inherit_stat + SAMPLE_READ events they
+ * should be switched out using the slow path
+ * so that they are treated as if they were
+ * distinct contexts.
*/
raw_spin_unlock(&next_ctx->lock);
rcu_read_unlock();
@@ -4533,8 +4559,11 @@ static void __perf_event_read(void *info)
raw_spin_unlock(&ctx->lock);
}

-static inline u64 perf_event_count(struct perf_event *event)
+static inline u64 perf_event_count(struct perf_event *event, bool self_value_only)
{
+ if (self_value_only && perf_event_has_inherit_and_sample_read(event))
+ return local64_read(&event->count);
+
return local64_read(&event->count) + atomic64_read(&event->child_count);
}

@@ -5454,7 +5483,7 @@ static u64 __perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *
mutex_lock(&event->child_mutex);

(void)perf_event_read(event, false);
- total += perf_event_count(event);
+ total += perf_event_count(event, false);

*enabled += event->total_time_enabled +
atomic64_read(&event->child_total_time_enabled);
@@ -5463,7 +5492,7 @@ static u64 __perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *

list_for_each_entry(child, &event->child_list, child_list) {
(void)perf_event_read(child, false);
- total += perf_event_count(child);
+ total += perf_event_count(child, false);
*enabled += child->total_time_enabled;
*running += child->total_time_running;
}
@@ -5545,14 +5574,14 @@ static int __perf_read_group_add(struct perf_event *leader,
/*
* Write {count,id} tuples for every sibling.
*/
- values[n++] += perf_event_count(leader);
+ values[n++] += perf_event_count(leader, false);
if (read_format & PERF_FORMAT_ID)
values[n++] = primary_event_id(leader);
if (read_format & PERF_FORMAT_LOST)
values[n++] = atomic64_read(&leader->lost_samples);

for_each_sibling_event(sub, leader) {
- values[n++] += perf_event_count(sub);
+ values[n++] += perf_event_count(sub, false);
if (read_format & PERF_FORMAT_ID)
values[n++] = primary_event_id(sub);
if (read_format & PERF_FORMAT_LOST)
@@ -6132,7 +6161,7 @@ void perf_event_update_userpage(struct perf_event *event)
++userpg->lock;
barrier();
userpg->index = perf_event_index(event);
- userpg->offset = perf_event_count(event);
+ userpg->offset = perf_event_count(event, false);
if (userpg->index)
userpg->offset -= local64_read(&event->hw.prev_count);

@@ -7200,7 +7229,7 @@ static void perf_output_read_one(struct perf_output_handle *handle,
u64 values[5];
int n = 0;

- values[n++] = perf_event_count(event);
+ values[n++] = perf_event_count(event, true);
if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED) {
values[n++] = enabled +
atomic64_read(&event->child_total_time_enabled);
@@ -7245,7 +7274,7 @@ static void perf_output_read_group(struct perf_output_handle *handle,
(leader->state == PERF_EVENT_STATE_ACTIVE))
leader->pmu->read(leader);

- values[n++] = perf_event_count(leader);
+ values[n++] = perf_event_count(leader, true);
if (read_format & PERF_FORMAT_ID)
values[n++] = primary_event_id(leader);
if (read_format & PERF_FORMAT_LOST)
@@ -7260,7 +7289,7 @@ static void perf_output_read_group(struct perf_output_handle *handle,
(sub->state == PERF_EVENT_STATE_ACTIVE))
sub->pmu->read(sub);

- values[n++] = perf_event_count(sub);
+ values[n++] = perf_event_count(sub, false);
if (read_format & PERF_FORMAT_ID)
values[n++] = primary_event_id(sub);
if (read_format & PERF_FORMAT_LOST)
@@ -12010,10 +12039,13 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
local64_set(&hwc->period_left, hwc->sample_period);

/*
- * We currently do not support PERF_SAMPLE_READ on inherited events.
+ * We do not support PERF_SAMPLE_READ on inherited events unless
+ * inherit_stat and PERF_SAMPLE_TID are also selected, which allows
+ * inherited events to collect per-thread samples.
* See perf_output_read().
*/
- if (attr->inherit && (attr->sample_type & PERF_SAMPLE_READ))
+ if (perf_attr_has_inherit_and_sample_read(attr)
+ && !(attr->inherit_stat && (attr->sample_type & PERF_SAMPLE_TID)))
goto err_ns;

if (!has_branch_stack(event))
@@ -13037,7 +13069,7 @@ static void sync_child_event(struct perf_event *child_event)
perf_event_read_event(child_event, task);
}

- child_val = perf_event_count(child_event);
+ child_val = perf_event_count(child_event, false);

/*
* Add back the child's count to the parent's count:
--
2.44.0


2024-03-22 16:43:23

by Ben Gainey

[permalink] [raw]
Subject: [PATCH v4 2/4] tools/perf: Track where perf_sample_ids need per-thread periods

perf_sample_ids and related evlist/evsel code are modified to track
which events combine inherit+inherit_stat+PERF_SAMPLE_READ.

Events with this combination of properties must be handled differently
when tracking each streams sample period, as their stream-id is now only
unique per thread, rather than globally unique.

Signed-off-by: Ben Gainey <[email protected]>
---
tools/lib/perf/evlist.c | 1 +
tools/lib/perf/evsel.c | 7 +++++++
tools/lib/perf/include/internal/evsel.h | 9 +++++++++
3 files changed, 17 insertions(+)

diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index 058e3ff10f9b..c585c49491a5 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -255,6 +255,7 @@ static void perf_evlist__id_hash(struct perf_evlist *evlist,

sid->id = id;
sid->evsel = evsel;
+ sid->period_per_thread = perf_evsel__attr_has_per_thread_sample_period(evsel);
hash = hash_64(sid->id, PERF_EVLIST__HLIST_BITS);
hlist_add_head(&sid->node, &evlist->heads[hash]);
}
diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
index c07160953224..dd60ee0557d8 100644
--- a/tools/lib/perf/evsel.c
+++ b/tools/lib/perf/evsel.c
@@ -537,6 +537,13 @@ void perf_evsel__free_id(struct perf_evsel *evsel)
evsel->ids = 0;
}

+bool perf_evsel__attr_has_per_thread_sample_period(struct perf_evsel *evsel)
+{
+ return (evsel->attr.sample_type & PERF_SAMPLE_READ)
+ && evsel->attr.inherit
+ && evsel->attr.inherit_stat;
+}
+
void perf_counts_values__scale(struct perf_counts_values *count,
bool scale, __s8 *pscaled)
{
diff --git a/tools/lib/perf/include/internal/evsel.h b/tools/lib/perf/include/internal/evsel.h
index 5cd220a61962..8dd58149986c 100644
--- a/tools/lib/perf/include/internal/evsel.h
+++ b/tools/lib/perf/include/internal/evsel.h
@@ -36,6 +36,13 @@ struct perf_sample_id {

/* Holds total ID period value for PERF_SAMPLE_READ processing. */
u64 period;
+
+ /*
+ * When inherit+inherit_stat is combined with PERF_SAMPLE_READ, the
+ * period value is per (sample_id, thread) tuple, rather than per
+ * sample_id.
+ */
+ bool period_per_thread;
};

struct perf_evsel {
@@ -88,4 +95,6 @@ int perf_evsel__apply_filter(struct perf_evsel *evsel, const char *filter);
int perf_evsel__alloc_id(struct perf_evsel *evsel, int ncpus, int nthreads);
void perf_evsel__free_id(struct perf_evsel *evsel);

+bool perf_evsel__attr_has_per_thread_sample_period(struct perf_evsel *evsel);
+
#endif /* __LIBPERF_INTERNAL_EVSEL_H */
--
2.44.0


2024-03-22 16:43:46

by Ben Gainey

[permalink] [raw]
Subject: [PATCH v4 3/4] tools/perf: Correctly calculate sample period for inherited SAMPLE_READ values

Sample period calculation is updated to take into account the fact that
the stream-id for events with inherit+inherit_stat+PERF_SAMPLE_READ
is now unique per thread, rather than globally unique.

perf_sample_id is modified to support tracking per-stream+per-thread
values, along with the existing global per-stream values. In the
per-thread case, values are stored in a hash by tid.

deliver_sample_value is modified to correctly locate the previous
sample storage based on the attribute, stream id and thread id.

Signed-off-by: Ben Gainey <[email protected]>
---
tools/lib/perf/evsel.c | 41 ++++++++++++++++++++++
tools/lib/perf/include/internal/evsel.h | 46 +++++++++++++++++++++++--
tools/perf/util/session.c | 11 ++++--
3 files changed, 93 insertions(+), 5 deletions(-)

diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
index dd60ee0557d8..2bdba78cc43e 100644
--- a/tools/lib/perf/evsel.c
+++ b/tools/lib/perf/evsel.c
@@ -5,6 +5,7 @@
#include <perf/evsel.h>
#include <perf/cpumap.h>
#include <perf/threadmap.h>
+#include <linux/hash.h>
#include <linux/list.h>
#include <internal/evsel.h>
#include <linux/zalloc.h>
@@ -23,6 +24,7 @@ void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr,
int idx)
{
INIT_LIST_HEAD(&evsel->node);
+ INIT_LIST_HEAD(&evsel->period_per_thread_periods);
evsel->attr = *attr;
evsel->idx = idx;
evsel->leader = evsel;
@@ -531,10 +533,17 @@ int perf_evsel__alloc_id(struct perf_evsel *evsel, int ncpus, int nthreads)

void perf_evsel__free_id(struct perf_evsel *evsel)
{
+ struct perf_sample_id_period *pos, *n;
+
xyarray__delete(evsel->sample_id);
evsel->sample_id = NULL;
zfree(&evsel->id);
evsel->ids = 0;
+
+ perf_evsel_for_each_per_thread_period_safe(evsel, n, pos) {
+ list_del_init(&pos->node);
+ zfree(pos);
+ }
}

bool perf_evsel__attr_has_per_thread_sample_period(struct perf_evsel *evsel)
@@ -544,6 +553,38 @@ bool perf_evsel__attr_has_per_thread_sample_period(struct perf_evsel *evsel)
&& evsel->attr.inherit_stat;
}

+u64 *perf_sample_id__get_period_storage(struct perf_sample_id *sid, u32 tid)
+{
+ struct hlist_head *head;
+ struct perf_sample_id_period *res;
+ int hash;
+
+ if (!sid->period_per_thread)
+ return &sid->period;
+
+ hash = hash_32(tid, PERF_SAMPLE_ID__HLIST_BITS);
+ head = &sid->periods[hash];
+
+ hlist_for_each_entry(res, head, hnode)
+ if (res->tid == tid)
+ return &res->period;
+
+ if (sid->evsel == NULL)
+ return NULL;
+
+ res = zalloc(sizeof(struct perf_sample_id_period));
+ if (res == NULL)
+ return NULL;
+
+ INIT_LIST_HEAD(&res->node);
+ res->tid = tid;
+
+ list_add_tail(&res->node, &sid->evsel->period_per_thread_periods);
+ hlist_add_head(&res->hnode, &sid->periods[hash]);
+
+ return &res->period;
+}
+
void perf_counts_values__scale(struct perf_counts_values *count,
bool scale, __s8 *pscaled)
{
diff --git a/tools/lib/perf/include/internal/evsel.h b/tools/lib/perf/include/internal/evsel.h
index 8dd58149986c..52ec2edf628b 100644
--- a/tools/lib/perf/include/internal/evsel.h
+++ b/tools/lib/perf/include/internal/evsel.h
@@ -11,6 +11,32 @@
struct perf_thread_map;
struct xyarray;

+/**
+ * The per-thread accumulated period storage node.
+ */
+struct perf_sample_id_period {
+ struct list_head node;
+ struct hlist_node hnode;
+ /* The thread that the values belongs to */
+ u32 tid;
+ /* Holds total ID period value for PERF_SAMPLE_READ processing. */
+ u64 period;
+};
+
+/**
+ * perf_evsel_for_each_per_thread_period_safe - safely iterate thru all the
+ * period_per_thread_periods
+ * @evlist:perf_evsel instance to iterate
+ * @item: struct perf_sample_id_period iterator
+ * @tmp: struct perf_sample_id_period temp iterator
+ */
+#define perf_evsel_for_each_per_thread_period_safe(evsel, tmp, item) \
+ list_for_each_entry_safe(item, tmp, &(evsel)->period_per_thread_periods, node)
+
+
+#define PERF_SAMPLE_ID__HLIST_BITS 4
+#define PERF_SAMPLE_ID__HLIST_SIZE (1 << PERF_SAMPLE_ID__HLIST_BITS)
+
/*
* Per fd, to map back from PERF_SAMPLE_ID to evsel, only used when there are
* more than one entry in the evlist.
@@ -19,6 +45,7 @@ struct perf_sample_id {
struct hlist_node node;
u64 id;
struct perf_evsel *evsel;
+
/*
* 'idx' will be used for AUX area sampling. A sample will have AUX area
* data that will be queued for decoding, where there are separate
@@ -34,8 +61,18 @@ struct perf_sample_id {
pid_t machine_pid;
struct perf_cpu vcpu;

- /* Holds total ID period value for PERF_SAMPLE_READ processing. */
- u64 period;
+ union {
+ /*
+ * Holds total ID period value for PERF_SAMPLE_READ processing
+ * (when period is not per-thread).
+ */
+ u64 period;
+ /*
+ * Holds total ID period value for PERF_SAMPLE_READ processing
+ * (when period is per-thread).
+ */
+ struct hlist_head periods[PERF_SAMPLE_ID__HLIST_SIZE];
+ };

/*
* When inherit+inherit_stat is combined with PERF_SAMPLE_READ, the
@@ -65,6 +102,9 @@ struct perf_evsel {
u32 ids;
struct perf_evsel *leader;

+ /* Where period_per_thread is true, stores the per-thread values */
+ struct list_head period_per_thread_periods;
+
/* parse modifier helper */
int nr_members;
/*
@@ -97,4 +137,6 @@ void perf_evsel__free_id(struct perf_evsel *evsel);

bool perf_evsel__attr_has_per_thread_sample_period(struct perf_evsel *evsel);

+u64 *perf_sample_id__get_period_storage(struct perf_sample_id *sid, u32 tid);
+
#endif /* __LIBPERF_INTERNAL_EVSEL_H */
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 199d3e8df315..2d6a92374847 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1478,14 +1478,19 @@ static int deliver_sample_value(struct evlist *evlist,
{
struct perf_sample_id *sid = evlist__id2sid(evlist, v->id);
struct evsel *evsel;
+ u64 *storage = NULL;

if (sid) {
+ storage = perf_sample_id__get_period_storage(sid, sample->tid);
+ }
+
+ if (storage) {
sample->id = v->id;
- sample->period = v->value - sid->period;
- sid->period = v->value;
+ sample->period = v->value - *storage;
+ *storage = v->value;
}

- if (!sid || sid->evsel == NULL) {
+ if (!storage || sid->evsel == NULL) {
++evlist->stats.nr_unknown_id;
return 0;
}
--
2.44.0


2024-03-22 16:43:56

by Ben Gainey

[permalink] [raw]
Subject: [PATCH v4 4/4] tools/perf: Allow inherit + inherit_stat + PERF_SAMPLE_READ when opening events

This change updates evsel to allow the combination of inherit
and PERF_SAMPLE_READ.

A fallback is implemented for kernel versions where this feature is not
supported.

The user must pass --stat option to perf record to opt into this new
behaviour.

Signed-off-by: Ben Gainey <[email protected]>
---
.../test-record-group-sampling-inherit-stat | 62 ++++++++++++++
tools/perf/util/evsel.c | 82 ++++++++++++++++++-
tools/perf/util/evsel.h | 1 +
3 files changed, 143 insertions(+), 2 deletions(-)
create mode 100644 tools/perf/tests/attr/test-record-group-sampling-inherit-stat

diff --git a/tools/perf/tests/attr/test-record-group-sampling-inherit-stat b/tools/perf/tests/attr/test-record-group-sampling-inherit-stat
new file mode 100644
index 000000000000..281a17acb6ae
--- /dev/null
+++ b/tools/perf/tests/attr/test-record-group-sampling-inherit-stat
@@ -0,0 +1,62 @@
+[config]
+command = record
+args = --stat --no-bpf-event -c 10000 -e '{cycles,cache-misses}:S' kill >/dev/null 2>&1
+ret = 1
+
+[event-1:base-record]
+fd=1
+group_fd=-1
+
+# cycles
+type=0
+config=0
+
+# default | PERF_SAMPLE_READ
+sample_type=87
+
+# PERF_FORMAT_ID | PERF_FORMAT_GROUP | PERF_FORMAT_LOST | PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING
+read_format=31
+task=1
+mmap=1
+comm=1
+enable_on_exec=1
+disabled=1
+
+# inherit is enabled for group sampling, as is inherit_stat
+inherit=1
+inherit_stat=1
+
+# sampling disabled
+sample_freq=0
+sample_period=10000
+freq=0
+write_backward=0
+
+[event-2:base-record]
+fd=2
+group_fd=1
+
+# cache-misses
+type=0
+config=3
+
+# default | PERF_SAMPLE_READ
+sample_type=87
+
+# PERF_FORMAT_ID | PERF_FORMAT_GROUP | PERF_FORMAT_LOST | PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING
+read_format=31
+task=0
+mmap=0
+comm=0
+enable_on_exec=0
+disabled=0
+
+# inherit is enabled for group sampling, as is inherit_stat
+inherit=1
+inherit_stat=1
+
+# sampling disabled
+sample_freq=0
+sample_period=0
+freq=0
+write_backward=0
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 6d7c9c58a9bc..aec6b4f5264e 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1097,6 +1097,64 @@ static bool evsel__is_offcpu_event(struct evsel *evsel)
return evsel__is_bpf_output(evsel) && evsel__name_is(evsel, OFFCPU_EVENT);
}

+static bool evsel__has_term_type(struct evsel *evsel, enum evsel_term_type type)
+{
+ return __evsel__get_config_term(evsel, type) != NULL;
+}
+
+/*
+ * Determine whether or not an evsel can support inherit+inherit_stat+PERF_SAMPLE_READ.
+ *
+ * In order not to break existing command line behaviour, this configuration
+ * will only be enabled if certain specific requirements are met:
+ *
+ * 1) When making a system-wide capture, there is no need to support this
+ * configuration. Likewise, if the user disables inherit, or does not request
+ * inherit_stat, then the configuration is not supported.
+ * 2) If the user explicitly specifies 'freq' as a config term, then do not
+ * support this feature, as frequency counters are not compatible.
+ * 3) If the user explicitly specifies 'period' as a config term, then the
+ * feature is compatible with that event.
+ * 4) If neither was explicitly set, and the event is part of a group, then
+ * base the decision on the leader.
+ * 5) Otherwise base the decision on whether or not the user specified a period
+ * or frequency on the command line (which includes the default frequency
+ * setting).
+ */
+static bool evsel__compat_with_inherit_sample_read(struct record_opts *opts,
+ struct evsel *leader,
+ struct evsel *evsel)
+{
+ struct perf_event_attr *attr = &evsel->core.attr;
+
+ if (opts->target.system_wide)
+ return false;
+
+ if (opts->no_inherit_set || !opts->inherit_stat)
+ return false;
+
+ if (evsel__has_term_type(evsel, EVSEL__CONFIG_TERM_FREQ))
+ return false;
+
+ if (evsel__has_term_type(evsel, EVSEL__CONFIG_TERM_PERIOD))
+ return true;
+
+ if (leader && (leader != evsel)) {
+ struct perf_event_attr *ldr_att = &leader->core.attr;
+
+ if (evsel__has_term_type(leader, EVSEL__CONFIG_TERM_FREQ))
+ return false;
+
+ if (evsel__has_term_type(leader, EVSEL__CONFIG_TERM_PERIOD))
+ return true;
+
+ if (ldr_att->freq)
+ return false;
+ }
+
+ return (!attr->freq && !opts->freq);
+}
+
/*
* The enable_on_exec/disabled value strategy:
*
@@ -1133,6 +1191,9 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
int track = evsel->tracking;
bool per_cpu = opts->target.default_per_cpu && !opts->target.per_thread;

+ bool allow_inherit_stat_sample_read = evsel__compat_with_inherit_sample_read(
+ opts, leader, evsel);
+
attr->sample_id_all = perf_missing_features.sample_id_all ? 0 : 1;
attr->inherit = !opts->no_inherit;
attr->write_backward = opts->overwrite ? 1 : 0;
@@ -1156,7 +1217,17 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
*/
if (leader->core.nr_members > 1) {
attr->read_format |= PERF_FORMAT_GROUP;
- attr->inherit = 0;
+ if (!allow_inherit_stat_sample_read)
+ attr->inherit = 0;
+ }
+
+ /*
+ * Inherit + READ requires inherit_stat, but only if freq is
+ * not set as the two are incompatible
+ */
+ if (attr->inherit && allow_inherit_stat_sample_read) {
+ attr->inherit_stat = 1;
+ evsel__set_sample_bit(evsel, TID);
}
}

@@ -1832,6 +1903,8 @@ static int __evsel__prepare_open(struct evsel *evsel, struct perf_cpu_map *cpus,

static void evsel__disable_missing_features(struct evsel *evsel)
{
+ if (perf_missing_features.inherit_sample_read)
+ evsel->core.attr.inherit = 0;
if (perf_missing_features.branch_counters)
evsel->core.attr.branch_sample_type &= ~PERF_SAMPLE_BRANCH_COUNTERS;
if (perf_missing_features.read_lost)
@@ -1887,7 +1960,12 @@ bool evsel__detect_missing_features(struct evsel *evsel)
* Must probe features in the order they were added to the
* perf_event_attr interface.
*/
- if (!perf_missing_features.branch_counters &&
+ if (!perf_missing_features.inherit_sample_read &&
+ evsel->core.attr.inherit && (evsel->core.attr.sample_type & PERF_SAMPLE_READ)) {
+ perf_missing_features.inherit_sample_read = true;
+ pr_debug2("Using PERF_SAMPLE_READ / :S modifier is not compatible with inherit, falling back to no-inherit.\n");
+ return true;
+ } else if (!perf_missing_features.branch_counters &&
(evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS)) {
perf_missing_features.branch_counters = true;
pr_debug2("switching off branch counters support\n");
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index efbb6e848287..11cc9b8bee27 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -192,6 +192,7 @@ struct perf_missing_features {
bool weight_struct;
bool read_lost;
bool branch_counters;
+ bool inherit_sample_read;
};

extern struct perf_missing_features perf_missing_features;
--
2.44.0


2024-03-23 01:23:00

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] perf: Support PERF_SAMPLE_READ with inherit_stat

On Fri, Mar 22, 2024 at 9:42 AM Ben Gainey <[email protected]> wrote:
>
> This change allows events to use PERF_SAMPLE READ with inherit so long
> as both inherit_stat and PERF_SAMPLE_TID are set.
>
> Currently it is not possible to use PERF_SAMPLE_READ with inherit. This
> restriction assumes the user is interested in collecting aggregate
> statistics as per `perf stat`. It prevents a user from collecting
> per-thread samples using counter groups from a multi-threaded or
> multi-process application, as with `perf record -e '{....}:S'`. Instead
> users must use system-wide mode, or forgo the ability to sample counter
> groups. System-wide mode is often problematic as it requires specific
> permissions (no CAP_PERFMON / root access), or may lead to capture of
> significant amounts of extra data from other processes running on the
> system.
>
> Perf already supports the ability to collect per-thread counts with
> `inherit` via the `inherit_stat` flag. This patch changes

I'm not sure about this part. IIUC inherit and inherit_stat is not for
per-thread counts, it only supports per-process (including children)
events.


> `perf_event_alloc` relaxing the restriction to combine `inherit` with
> `PERF_SAMPLE_READ` so that the combination will be allowed so long as
> `inherit_stat` and `PERF_SAMPLE_TID` are enabled.

Anyway, does it really need 'inherit_stat'? I think it's only for
counting use cases (e.g. 'perf stat') not for sampling.

Also technically, it can have PERF_SAMPLE_STREAM_ID instead
of PERF_SAMPLE_TID to distinguish the counter values.

>
> In this configuration stream ids (such as may appear in the read_format
> field of a PERF_RECORD_SAMPLE) are no longer globally unique, rather
> the pair of (stream id, tid) uniquely identify each event. Tools that
> rely on this, for example to calculate a delta between samples, would
> need updating to take this into account. Previously valid event
> configurations (system-wide, no-inherit and so on) where each stream id
> is the identifier are unaffected.

I think you meant PERF_SAMPLE_ID not PERF_SAMPLE_STREAM_ID.
IIUC the stream id is already unique.

Thanks,
Namhyung

>
>
> Changes since v3:
> - Cleaned up perf test data changes incorrectly included into this
> series from elsewhere.
>
> Changes since v2:
> - Rebase on v6.8
> - Respond to James Clarke's feedback; fixup some typos and move some
> repeated checks into a helper macro.
> - Cleaned up checkpatch lints.
> - Updated perf test; fixed evsel handling so that existing tests pass
> and added new tests to cover the new behaviour.
>
> Changes since v1:
> - Rebase on v6.8-rc1
> - Fixed value written into sample after child exists.
> - Modified handling of switch-out so that context with these events
> take the slow path, so that the per-event/per-thread PMU state is
> correctly switched.
> - Modified perf tools to support this mode of operation.
>
> Ben Gainey (4):
> perf: Support PERF_SAMPLE_READ with inherit_stat
> tools/perf: Track where perf_sample_ids need per-thread periods
> tools/perf: Correctly calculate sample period for inherited
> SAMPLE_READ values
> tools/perf: Allow inherit + inherit_stat + PERF_SAMPLE_READ when
> opening events
>
> include/linux/perf_event.h | 1 +
> kernel/events/core.c | 62 ++++++++++----
> tools/lib/perf/evlist.c | 1 +
> tools/lib/perf/evsel.c | 48 +++++++++++
> tools/lib/perf/include/internal/evsel.h | 55 ++++++++++++-
> .../test-record-group-sampling-inherit-stat | 62 ++++++++++++++
> tools/perf/util/evsel.c | 82 ++++++++++++++++++-
> tools/perf/util/evsel.h | 1 +
> tools/perf/util/session.c | 11 ++-
> 9 files changed, 301 insertions(+), 22 deletions(-)
> create mode 100644 tools/perf/tests/attr/test-record-group-sampling-inherit-stat
>
> --
> 2.44.0
>

2024-03-25 14:19:10

by Ben Gainey

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] perf: Support PERF_SAMPLE_READ with inherit_stat

On Fri, 2024-03-22 at 18:22 -0700, Namhyung Kim wrote:
> On Fri, Mar 22, 2024 at 9:42 AM Ben Gainey <[email protected]>
> wrote:
> >
> > This change allows events to use PERF_SAMPLE READ with inherit so
> > long
> > as both inherit_stat and PERF_SAMPLE_TID are set.
> >
> > Currently it is not possible to use PERF_SAMPLE_READ with inherit.
> > This
> > restriction assumes the user is interested in collecting aggregate
> > statistics as per `perf stat`. It prevents a user from collecting
> > per-thread samples using counter groups from a multi-threaded or
> > multi-process application, as with `perf record -e '{....}:S'`.
> > Instead
> > users must use system-wide mode, or forgo the ability to sample
> > counter
> > groups. System-wide mode is often problematic as it requires
> > specific
> > permissions (no CAP_PERFMON / root access), or may lead to capture
> > of
> > significant amounts of extra data from other processes running on
> > the
> > system.
> >
> > Perf already supports the ability to collect per-thread counts with
> > `inherit` via the `inherit_stat` flag. This patch changes
>
> I'm not sure about this part. IIUC inherit and inherit_stat is not
> for
> per-thread counts, it only supports per-process (including children)
> events.

Hi Namhyung

Thanks for the comments...

I don't think this is correct, if you compare the behaviour of

perf record --no-inherit ... <some-forking-processes>
perf script -F pid,tid | sort -u
and
perf record --no-inherit ... <some-multithreaded-processes>
perf script -F pid,tid | sort -u

vs

perf record ... <some-forking-processes>
perf script -F pid,tid | sort -u
and
perf record .. <some-multithreaded-processes>
perf script -F pid,tid | sort -u

The behaviour is consistent with the fact that no-inherit only records
the primary thread of the primary process, whereas in the inherit case
any child tasks (either threads or forked processes) is recorded.


>
>
> > `perf_event_alloc` relaxing the restriction to combine `inherit`
> > with
> > `PERF_SAMPLE_READ` so that the combination will be allowed so long
> > as
> > `inherit_stat` and `PERF_SAMPLE_TID` are enabled.
>
> Anyway, does it really need 'inherit_stat'? I think it's only for
> counting use cases (e.g. 'perf stat') not for sampling.


I would be very happy to remove the inherit_stat requirement. When I
first came to this it seemed like the logic was all there in
inherit_stat already, but now that I have to take a different path in
`perf_event_context_sched_out` I suspect it should be trivial to remove
the inherit_stat requirement.


>
> Also technically, it can have PERF_SAMPLE_STREAM_ID instead
> of PERF_SAMPLE_TID to distinguish the counter values.


It looks like you are correct, but the ID given in the read_format part
of PERF_SAMPLE_RECORD is the ID rather than STREAM_ID. (I had
incorrectly thought/stated it was the latter). Hence when processing
the read_format values in the sample record, we either need to use the
TID to uniquely identify the source, or we would need to modify the
read_format to (additionally) include the STREAM_ID.

* The current approach in tools uses the ID+TID, which puts more
complexity in the tools but means there isn't an extra field in the
read_format data (for each value).
* Alternatively I could introduce a PERF_FORMAT_STREAM_ID; I would
expect that the user/tool would need to specify
PERF_FORMAT_ID|PERF_FORMAT_STREAM_ID as they would need to use the ID
to lookup the correct perf_event_attr, but could use the STREAM_ID to
uniquely identify the child event. This approach would add an extra u64
per value in the read_format data but is possibly simpler/safer for
tools?

Any preferences?


>
> >
> > In this configuration stream ids (such as may appear in the
> > read_format
> > field of a PERF_RECORD_SAMPLE) are no longer globally unique,
> > rather
> > the pair of (stream id, tid) uniquely identify each event. Tools
> > that
> > rely on this, for example to calculate a delta between samples,
> > would
> > need updating to take this into account. Previously valid event
> > configurations (system-wide, no-inherit and so on) where each
> > stream id
> > is the identifier are unaffected.
>
> I think you meant PERF_SAMPLE_ID not PERF_SAMPLE_STREAM_ID.
> IIUC the stream id is already unique.
>

Yes my mistake.


> Thanks,
> Namhyung
>
> >
> >
> > Changes since v3:
> > - Cleaned up perf test data changes incorrectly included into this
> > series from elsewhere.
> >
> > Changes since v2:
> > - Rebase on v6.8
> > - Respond to James Clarke's feedback; fixup some typos and move
> > some
> > repeated checks into a helper macro.
> > - Cleaned up checkpatch lints.
> > - Updated perf test; fixed evsel handling so that existing tests
> > pass
> > and added new tests to cover the new behaviour.
> >
> > Changes since v1:
> > - Rebase on v6.8-rc1
> > - Fixed value written into sample after child exists.
> > - Modified handling of switch-out so that context with these
> > events
> > take the slow path, so that the per-event/per-thread PMU state
> > is
> > correctly switched.
> > - Modified perf tools to support this mode of operation.
> >
> > Ben Gainey (4):
> > perf: Support PERF_SAMPLE_READ with inherit_stat
> > tools/perf: Track where perf_sample_ids need per-thread periods
> > tools/perf: Correctly calculate sample period for inherited
> > SAMPLE_READ values
> > tools/perf: Allow inherit + inherit_stat + PERF_SAMPLE_READ when
> > opening events
> >
> > include/linux/perf_event.h | 1 +
> > kernel/events/core.c | 62 ++++++++++----
> > tools/lib/perf/evlist.c | 1 +
> > tools/lib/perf/evsel.c | 48 +++++++++++
> > tools/lib/perf/include/internal/evsel.h | 55 ++++++++++++-
> > .../test-record-group-sampling-inherit-stat | 62 ++++++++++++++
> > tools/perf/util/evsel.c | 82
> > ++++++++++++++++++-
> > tools/perf/util/evsel.h | 1 +
> > tools/perf/util/session.c | 11 ++-
> > 9 files changed, 301 insertions(+), 22 deletions(-)
> > create mode 100644 tools/perf/tests/attr/test-record-group-
> > sampling-inherit-stat
> >
> > --
> > 2.44.0
> >

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2024-03-27 20:34:42

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] perf: Support PERF_SAMPLE_READ with inherit_stat

Hello,

On Mon, Mar 25, 2024 at 3:12 AM Ben Gainey <[email protected]> wrote:
>
> On Fri, 2024-03-22 at 18:22 -0700, Namhyung Kim wrote:
> > On Fri, Mar 22, 2024 at 9:42 AM Ben Gainey <[email protected]>
> > wrote:
> > >
> > > This change allows events to use PERF_SAMPLE READ with inherit so
> > > long
> > > as both inherit_stat and PERF_SAMPLE_TID are set.
> > >
> > > Currently it is not possible to use PERF_SAMPLE_READ with inherit.
> > > This
> > > restriction assumes the user is interested in collecting aggregate
> > > statistics as per `perf stat`. It prevents a user from collecting
> > > per-thread samples using counter groups from a multi-threaded or
> > > multi-process application, as with `perf record -e '{....}:S'`.
> > > Instead
> > > users must use system-wide mode, or forgo the ability to sample
> > > counter
> > > groups. System-wide mode is often problematic as it requires
> > > specific
> > > permissions (no CAP_PERFMON / root access), or may lead to capture
> > > of
> > > significant amounts of extra data from other processes running on
> > > the
> > > system.
> > >
> > > Perf already supports the ability to collect per-thread counts with
> > > `inherit` via the `inherit_stat` flag. This patch changes
> >
> > I'm not sure about this part. IIUC inherit and inherit_stat is not
> > for
> > per-thread counts, it only supports per-process (including children)
> > events.
>
> Hi Namhyung
>
> Thanks for the comments...
>
> I don't think this is correct, if you compare the behaviour of
>
> perf record --no-inherit ... <some-forking-processes>
> perf script -F pid,tid | sort -u
> and
> perf record --no-inherit ... <some-multithreaded-processes>
> perf script -F pid,tid | sort -u
>
> vs
>
> perf record ... <some-forking-processes>
> perf script -F pid,tid | sort -u
> and
> perf record .. <some-multithreaded-processes>
> perf script -F pid,tid | sort -u
>
> The behaviour is consistent with the fact that no-inherit only records
> the primary thread of the primary process, whereas in the inherit case
> any child tasks (either threads or forked processes) is recorded.

Right, I was talking about the counting behavior not sampling
as inherit_stat is only for the counting. I think it'd return an error
if event attr has both sample_freq and inherit_stat.


> >
> >
> > > `perf_event_alloc` relaxing the restriction to combine `inherit`
> > > with
> > > `PERF_SAMPLE_READ` so that the combination will be allowed so long
> > > as
> > > `inherit_stat` and `PERF_SAMPLE_TID` are enabled.
> >
> > Anyway, does it really need 'inherit_stat'? I think it's only for
> > counting use cases (e.g. 'perf stat') not for sampling.
>
>
> I would be very happy to remove the inherit_stat requirement. When I
> first came to this it seemed like the logic was all there in
> inherit_stat already, but now that I have to take a different path in
> `perf_event_context_sched_out` I suspect it should be trivial to remove
> the inherit_stat requirement.

ok.

> >
> > Also technically, it can have PERF_SAMPLE_STREAM_ID instead
> > of PERF_SAMPLE_TID to distinguish the counter values.
>
>
> It looks like you are correct, but the ID given in the read_format part
> of PERF_SAMPLE_RECORD is the ID rather than STREAM_ID. (I had
> incorrectly thought/stated it was the latter). Hence when processing
> the read_format values in the sample record, we either need to use the
> TID to uniquely identify the source, or we would need to modify the
> read_format to (additionally) include the STREAM_ID.
>
> * The current approach in tools uses the ID+TID, which puts more
> complexity in the tools but means there isn't an extra field in the
> read_format data (for each value).
> * Alternatively I could introduce a PERF_FORMAT_STREAM_ID; I would
> expect that the user/tool would need to specify
> PERF_FORMAT_ID|PERF_FORMAT_STREAM_ID as they would need to use the ID
> to lookup the correct perf_event_attr, but could use the STREAM_ID to
> uniquely identify the child event. This approach would add an extra u64
> per value in the read_format data but is possibly simpler/safer for
> tools?
>
> Any preferences?

I think it's better to use TID + ID. IIUC there's no way to track
STREAM_ID for new children other than getting it from sample.
As sample has TID already it'd be meaningless using STREAM_ID
to distinguish events.

Thanks,
Namhyung

2024-04-02 10:30:20

by Ben Gainey

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] perf: Support PERF_SAMPLE_READ with inherit_stat

On Wed, 2024-03-27 at 13:34 -0700, Namhyung Kim wrote:
> Hello,
>
> On Mon, Mar 25, 2024 at 3:12 AM Ben Gainey <[email protected]>
> wrote:
> >
> > On Fri, 2024-03-22 at 18:22 -0700, Namhyung Kim wrote:
> > > On Fri, Mar 22, 2024 at 9:42 AM Ben Gainey <[email protected]>
> > > wrote:
> > > >
> > > > This change allows events to use PERF_SAMPLE READ with inherit
> > > > so
> > > > long
> > > > as both inherit_stat and PERF_SAMPLE_TID are set.
> > > >
> > > > Currently it is not possible to use PERF_SAMPLE_READ with
> > > > inherit.
> > > > This
> > > > restriction assumes the user is interested in collecting
> > > > aggregate
> > > > statistics as per `perf stat`. It prevents a user from
> > > > collecting
> > > > per-thread samples using counter groups from a multi-threaded
> > > > or
> > > > multi-process application, as with `perf record -e '{....}:S'`.
> > > > Instead
> > > > users must use system-wide mode, or forgo the ability to sample
> > > > counter
> > > > groups. System-wide mode is often problematic as it requires
> > > > specific
> > > > permissions (no CAP_PERFMON / root access), or may lead to
> > > > capture
> > > > of
> > > > significant amounts of extra data from other processes running
> > > > on
> > > > the
> > > > system.
> > > >
> > > > Perf already supports the ability to collect per-thread counts
> > > > with
> > > > `inherit` via the `inherit_stat` flag. This patch changes
> > >
> > > I'm not sure about this part. IIUC inherit and inherit_stat is
> > > not
> > > for
> > > per-thread counts, it only supports per-process (including
> > > children)
> > > events.
> >
> > Hi Namhyung
> >
> > Thanks for the comments...
> >
> > I don't think this is correct, if you compare the behaviour of
> >
> > perf record --no-inherit ... <some-forking-processes>
> > perf script -F pid,tid | sort -u
> > and
> > perf record --no-inherit ... <some-multithreaded-processes>
> > perf script -F pid,tid | sort -u
> >
> > vs
> >
> > perf record ... <some-forking-processes>
> > perf script -F pid,tid | sort -u
> > and
> > perf record .. <some-multithreaded-processes>
> > perf script -F pid,tid | sort -u
> >
> > The behaviour is consistent with the fact that no-inherit only
> > records
> > the primary thread of the primary process, whereas in the inherit
> > case
> > any child tasks (either threads or forked processes) is recorded.
>
> Right, I was talking about the counting behavior not sampling
> as inherit_stat is only for the counting. I think it'd return an
> error
> if event attr has both sample_freq and inherit_stat.
>
>
> > >
> > >
> > > > `perf_event_alloc` relaxing the restriction to combine
> > > > `inherit`
> > > > with
> > > > `PERF_SAMPLE_READ` so that the combination will be allowed so
> > > > long
> > > > as
> > > > `inherit_stat` and `PERF_SAMPLE_TID` are enabled.
> > >
> > > Anyway, does it really need 'inherit_stat'? I think it's only
> > > for
> > > counting use cases (e.g. 'perf stat') not for sampling.
> >
> >
> > I would be very happy to remove the inherit_stat requirement. When
> > I
> > first came to this it seemed like the logic was all there in
> > inherit_stat already, but now that I have to take a different path
> > in
> > `perf_event_context_sched_out` I suspect it should be trivial to
> > remove
> > the inherit_stat requirement.
>
> ok.

>
> > >
> > > Also technically, it can have PERF_SAMPLE_STREAM_ID instead
> > > of PERF_SAMPLE_TID to distinguish the counter values.
> >
> >
> > It looks like you are correct, but the ID given in the read_format
> > part
> > of PERF_SAMPLE_RECORD is the ID rather than STREAM_ID. (I had
> > incorrectly thought/stated it was the latter). Hence when
> > processing
> > the read_format values in the sample record, we either need to use
> > the
> > TID to uniquely identify the source, or we would need to modify the
> > read_format to (additionally) include the STREAM_ID.
> >
> > * The current approach in tools uses the ID+TID, which puts more
> > complexity in the tools but means there isn't an extra field in the
> > read_format data (for each value).
> > * Alternatively I could introduce a PERF_FORMAT_STREAM_ID; I would
> > expect that the user/tool would need to specify
> > PERF_FORMAT_ID|PERF_FORMAT_STREAM_ID as they would need to use the
> > ID
> > to lookup the correct perf_event_attr, but could use the STREAM_ID
> > to
> > uniquely identify the child event. This approach would add an extra
> > u64
> > per value in the read_format data but is possibly simpler/safer for
> > tools?
> >
> > Any preferences?
>
> I think it's better to use TID + ID. IIUC there's no way to track
> STREAM_ID for new children other than getting it from sample.
> As sample has TID already it'd be meaningless using STREAM_ID
> to distinguish events.


So i did implement a prototype version of this where:

PERF_FORMAT_STREAM_ID is added, and `perf record` is modified to
request that in addition to PERF_FORMAT_ID.

The PERF_FORMAT_ID is necessary to identify which perf_event_attr/evsel
the counter represents, and the PERF_FORMAT_STREAM_ID uniquely
identifies the child thread. It requires almost all the same plumbing
as the previous implementation... I just modified my
`perf_sample_id__get_period_storage` in patch 3/4 to take the u64
stream id instead of the u32 tid. I expect there could be some cleanup
there, but I guess it highlights the fact that they amount to the same
outcome.

I'll park that and push out a new patch set with just the inherit
requirement.

Thanks

Ben
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.