This change allows events to use PERF_SAMPLE READ with inherit so long
as PERF_SAMPLE_TID is also set.
Currently it is not possible to use PERF_SAMPLE_READ with inherit. This
restriction assumes the user is interested in collecting aggregate
statistics as per `perf stat`. It prevents a user from collecting
per-thread samples using counter groups from a multi-threaded or
multi-process application, as with `perf record -e '{....}:S'`. Instead
users must use system-wide mode, or forgo the ability to sample counter
groups. System-wide mode is often problematic as it requires specific
permissions (no CAP_PERFMON / root access), or may lead to capture of
significant amounts of extra data from other processes running on the
system.
This patch changes `perf_event_alloc` relaxing the restriction against
combining `inherit` with `PERF_SAMPLE_READ` so that the combination
will be allowed so long as `PERF_SAMPLE_TID` is enabled. It modifies
sampling so that only the count associated with the active thread is
recorded into the buffer. It modifies the context switch handling so
that perf contexts are always switched out if they have this kind of
event so that the correct per-thread state is maintained. Finally, the
tools are updated to allow perf record to specify this combination and
to correctly decode the sample data.
In this configuration stream ids (such as may appear in the read_format
field of a PERF_RECORD_SAMPLE) are no longer globally unique, rather
the pair of (stream id, tid) uniquely identify each event. Tools that
rely on this, for example to calculate a delta between samples, would
need updating to take this into account. Previously valid event
configurations (system-wide, no-inherit and so on) where each stream id
is the identifier are unaffected.
Changes since v5:
- Rebase on v6.9
- Cleanup feedback from Namhyung Kim
Changes since v4:
- Rebase on v6.9-rc1
- Removed the dependency on inherit_stat that was previously assumed
necessary as per feedback from Namhyung Kim.
- Fixed an incorrect use of zfree instead of free in the tools leading
to an abort on tool shutdown.
- Additional test coverage improvements added to perf test.
- Cleaned up the remaining bit of irrelevant change missed between v3
and v4.
Changes since v3:
- Cleaned up perf test data changes incorrectly included into this
series from elsewhere.
Changes since v2:
- Rebase on v6.8
- Respond to James Clarke's feedback; fixup some typos and move some
repeated checks into a helper macro.
- Cleaned up checkpatch lints.
- Updated perf test; fixed evsel handling so that existing tests pass
and added new tests to cover the new behaviour.
Changes since v1:
- Rebase on v6.8-rc1
- Fixed value written into sample after child exists.
- Modified handling of switch-out so that context with these events
take the slow path, so that the per-event/per-thread PMU state is
correctly switched.
- Modified perf tools to support this mode of operation.
Ben Gainey (4):
perf: Support PERF_SAMPLE_READ with inherit
tools/perf: Track where perf_sample_ids need per-thread periods
tools/perf: Correctly calculate sample period for inherited
SAMPLE_READ values
tools/perf: Allow inherit + PERF_SAMPLE_READ when opening events
include/linux/perf_event.h | 1 +
kernel/events/core.c | 78 ++++++++++++++-----
tools/lib/perf/evlist.c | 1 +
tools/lib/perf/evsel.c | 48 ++++++++++++
tools/lib/perf/include/internal/evsel.h | 54 ++++++++++++-
tools/perf/tests/attr/README | 2 +
.../tests/attr/test-record-group-sampling | 39 ----------
.../tests/attr/test-record-group-sampling1 | 50 ++++++++++++
.../tests/attr/test-record-group-sampling2 | 60 ++++++++++++++
tools/perf/tests/attr/test-record-group2 | 9 ++-
tools/perf/util/evsel.c | 19 ++++-
tools/perf/util/evsel.h | 1 +
tools/perf/util/session.c | 11 ++-
13 files changed, 302 insertions(+), 71 deletions(-)
delete mode 100644 tools/perf/tests/attr/test-record-group-sampling
create mode 100644 tools/perf/tests/attr/test-record-group-sampling1
create mode 100644 tools/perf/tests/attr/test-record-group-sampling2
--
2.45.1
perf_sample_ids and related evlist/evsel code are modified to track
which events combine inherit+PERF_SAMPLE_READ+PERF_SAMPLE_TID.
Events with this combination of properties must be handled differently
when calculating each sample period. They must use the combination
of (ID + TID) to uniquely identify each distinct sequence of values.
Signed-off-by: Ben Gainey <[email protected]>
---
tools/lib/perf/evlist.c | 1 +
tools/lib/perf/evsel.c | 7 +++++++
tools/lib/perf/include/internal/evsel.h | 9 +++++++++
3 files changed, 17 insertions(+)
diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index c6d67fc9e57ef..d17288eeaee48 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -255,6 +255,7 @@ static void perf_evlist__id_hash(struct perf_evlist *evlist,
sid->id = id;
sid->evsel = evsel;
+ sid->period_per_thread = perf_evsel__attr_has_per_thread_sample_period(evsel);
hash = hash_64(sid->id, PERF_EVLIST__HLIST_BITS);
hlist_add_head(&sid->node, &evlist->heads[hash]);
}
diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
index c07160953224a..f7abb879f416e 100644
--- a/tools/lib/perf/evsel.c
+++ b/tools/lib/perf/evsel.c
@@ -537,6 +537,13 @@ void perf_evsel__free_id(struct perf_evsel *evsel)
evsel->ids = 0;
}
+bool perf_evsel__attr_has_per_thread_sample_period(struct perf_evsel *evsel)
+{
+ return (evsel->attr.sample_type & PERF_SAMPLE_READ)
+ && (evsel->attr.sample_type & PERF_SAMPLE_TID)
+ && evsel->attr.inherit;
+}
+
void perf_counts_values__scale(struct perf_counts_values *count,
bool scale, __s8 *pscaled)
{
diff --git a/tools/lib/perf/include/internal/evsel.h b/tools/lib/perf/include/internal/evsel.h
index 5cd220a61962e..f8de2bf89c76b 100644
--- a/tools/lib/perf/include/internal/evsel.h
+++ b/tools/lib/perf/include/internal/evsel.h
@@ -36,6 +36,13 @@ struct perf_sample_id {
/* Holds total ID period value for PERF_SAMPLE_READ processing. */
u64 period;
+
+ /*
+ * When inherit is combined with PERF_SAMPLE_READ, the period value is
+ * per (id, thread) tuple, rather than per id, so use the stream_id to
+ * uniquely identify the period, rather than the id.
+ */
+ bool period_per_thread;
};
struct perf_evsel {
@@ -88,4 +95,6 @@ int perf_evsel__apply_filter(struct perf_evsel *evsel, const char *filter);
int perf_evsel__alloc_id(struct perf_evsel *evsel, int ncpus, int nthreads);
void perf_evsel__free_id(struct perf_evsel *evsel);
+bool perf_evsel__attr_has_per_thread_sample_period(struct perf_evsel *evsel);
+
#endif /* __LIBPERF_INTERNAL_EVSEL_H */
--
2.45.1
This change allows events to use PERF_SAMPLE READ with inherit
so long as PERF_SAMPLE_TID is also set.
In this configuration, an event will be inherited into any
child processes / threads, allowing convenient profiling of a
multiprocess or multithreaded application, whilst allowing
profiling tools to collect per-thread samples, in particular
of groups of counters.
The read_format field of both PERF_RECORD_READ and PERF_RECORD_SAMPLE
are changed by this new configuration, but calls to `read()` on the same
event file descriptor are unaffected and continue to return the
cumulative total.
Signed-off-by: Ben Gainey <[email protected]>
---
include/linux/perf_event.h | 1 +
kernel/events/core.c | 78 ++++++++++++++++++++++++++++----------
2 files changed, 58 insertions(+), 21 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index d2a15c0c6f8a9..e7eed33c50f16 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -932,6 +932,7 @@ struct perf_event_context {
int nr_task_data;
int nr_stat;
+ int nr_inherit_read;
int nr_freq;
int rotate_disable;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 724e6d7e128f3..e7c847956ebff 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1767,6 +1767,14 @@ perf_event_groups_next(struct perf_event *event, struct pmu *pmu)
event = rb_entry_safe(rb_next(&event->group_node), \
typeof(*event), group_node))
+/*
+ * Does the event attribute request inherit with PERF_SAMPLE_READ
+ */
+static inline bool has_inherit_and_sample_read(struct perf_event_attr *attr)
+{
+ return attr->inherit && (attr->sample_type & PERF_SAMPLE_READ);
+}
+
/*
* Add an event from the lists for its context.
* Must be called with ctx->mutex and ctx->lock held.
@@ -1797,6 +1805,8 @@ list_add_event(struct perf_event *event, struct perf_event_context *ctx)
ctx->nr_user++;
if (event->attr.inherit_stat)
ctx->nr_stat++;
+ if (has_inherit_and_sample_read(&event->attr))
+ ctx->nr_inherit_read++;
if (event->state > PERF_EVENT_STATE_OFF)
perf_cgroup_event_enable(event, ctx);
@@ -2021,6 +2031,8 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx)
ctx->nr_user--;
if (event->attr.inherit_stat)
ctx->nr_stat--;
+ if (has_inherit_and_sample_read(&event->attr))
+ ctx->nr_inherit_read--;
list_del_rcu(&event->event_entry);
@@ -3529,11 +3541,18 @@ perf_event_context_sched_out(struct task_struct *task, struct task_struct *next)
perf_ctx_disable(ctx, false);
/* PMIs are disabled; ctx->nr_pending is stable. */
- if (local_read(&ctx->nr_pending) ||
+ if (ctx->nr_inherit_read ||
+ next_ctx->nr_inherit_read ||
+ local_read(&ctx->nr_pending) ||
local_read(&next_ctx->nr_pending)) {
/*
* Must not swap out ctx when there's pending
* events that rely on the ctx->task relation.
+ *
+ * Likewise, when a context contains inherit +
+ * SAMPLE_READ events they should be switched
+ * out using the slow path so that they are
+ * treated as if they were distinct contexts.
*/
raw_spin_unlock(&next_ctx->lock);
rcu_read_unlock();
@@ -4533,11 +4552,19 @@ static void __perf_event_read(void *info)
raw_spin_unlock(&ctx->lock);
}
-static inline u64 perf_event_count(struct perf_event *event)
+static inline u64 perf_event_count_cumulative(struct perf_event *event)
{
return local64_read(&event->count) + atomic64_read(&event->child_count);
}
+static inline u64 perf_event_count(struct perf_event *event, bool self_value_only)
+{
+ if (self_value_only && has_inherit_and_sample_read(&event->attr))
+ return local64_read(&event->count);
+
+ return perf_event_count_cumulative(event);
+}
+
static void calc_timer_values(struct perf_event *event,
u64 *now,
u64 *enabled,
@@ -5454,7 +5481,7 @@ static u64 __perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *
mutex_lock(&event->child_mutex);
(void)perf_event_read(event, false);
- total += perf_event_count(event);
+ total += perf_event_count_cumulative(event);
*enabled += event->total_time_enabled +
atomic64_read(&event->child_total_time_enabled);
@@ -5463,7 +5490,7 @@ static u64 __perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *
list_for_each_entry(child, &event->child_list, child_list) {
(void)perf_event_read(child, false);
- total += perf_event_count(child);
+ total += perf_event_count_cumulative(child);
*enabled += child->total_time_enabled;
*running += child->total_time_running;
}
@@ -5545,14 +5572,14 @@ static int __perf_read_group_add(struct perf_event *leader,
/*
* Write {count,id} tuples for every sibling.
*/
- values[n++] += perf_event_count(leader);
+ values[n++] += perf_event_count_cumulative(leader);
if (read_format & PERF_FORMAT_ID)
values[n++] = primary_event_id(leader);
if (read_format & PERF_FORMAT_LOST)
values[n++] = atomic64_read(&leader->lost_samples);
for_each_sibling_event(sub, leader) {
- values[n++] += perf_event_count(sub);
+ values[n++] += perf_event_count_cumulative(sub);
if (read_format & PERF_FORMAT_ID)
values[n++] = primary_event_id(sub);
if (read_format & PERF_FORMAT_LOST)
@@ -6132,7 +6159,7 @@ void perf_event_update_userpage(struct perf_event *event)
++userpg->lock;
barrier();
userpg->index = perf_event_index(event);
- userpg->offset = perf_event_count(event);
+ userpg->offset = perf_event_count_cumulative(event);
if (userpg->index)
userpg->offset -= local64_read(&event->hw.prev_count);
@@ -7194,13 +7221,14 @@ void perf_event__output_id_sample(struct perf_event *event,
static void perf_output_read_one(struct perf_output_handle *handle,
struct perf_event *event,
- u64 enabled, u64 running)
+ u64 enabled, u64 running,
+ bool from_sample)
{
u64 read_format = event->attr.read_format;
u64 values[5];
int n = 0;
- values[n++] = perf_event_count(event);
+ values[n++] = perf_event_count(event, from_sample);
if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED) {
values[n++] = enabled +
atomic64_read(&event->child_total_time_enabled);
@@ -7218,8 +7246,9 @@ static void perf_output_read_one(struct perf_output_handle *handle,
}
static void perf_output_read_group(struct perf_output_handle *handle,
- struct perf_event *event,
- u64 enabled, u64 running)
+ struct perf_event *event,
+ u64 enabled, u64 running,
+ bool from_sample)
{
struct perf_event *leader = event->group_leader, *sub;
u64 read_format = event->attr.read_format;
@@ -7245,7 +7274,7 @@ static void perf_output_read_group(struct perf_output_handle *handle,
(leader->state == PERF_EVENT_STATE_ACTIVE))
leader->pmu->read(leader);
- values[n++] = perf_event_count(leader);
+ values[n++] = perf_event_count(leader, from_sample);
if (read_format & PERF_FORMAT_ID)
values[n++] = primary_event_id(leader);
if (read_format & PERF_FORMAT_LOST)
@@ -7260,7 +7289,7 @@ static void perf_output_read_group(struct perf_output_handle *handle,
(sub->state == PERF_EVENT_STATE_ACTIVE))
sub->pmu->read(sub);
- values[n++] = perf_event_count(sub);
+ values[n++] = perf_event_count(sub, from_sample);
if (read_format & PERF_FORMAT_ID)
values[n++] = primary_event_id(sub);
if (read_format & PERF_FORMAT_LOST)
@@ -7281,9 +7310,14 @@ static void perf_output_read_group(struct perf_output_handle *handle,
* The problem is that its both hard and excessively expensive to iterate the
* child list, not to mention that its impossible to IPI the children running
* on another CPU, from interrupt/NMI context.
+ *
+ * Instead the combination of PERF_SAMPLE_READ and inherit will track per-thread
+ * counts rather than attempting to accumulate some value across all children on
+ * all cores.
*/
static void perf_output_read(struct perf_output_handle *handle,
- struct perf_event *event)
+ struct perf_event *event,
+ bool from_sample)
{
u64 enabled = 0, running = 0, now;
u64 read_format = event->attr.read_format;
@@ -7301,9 +7335,9 @@ static void perf_output_read(struct perf_output_handle *handle,
calc_timer_values(event, &now, &enabled, &running);
if (event->attr.read_format & PERF_FORMAT_GROUP)
- perf_output_read_group(handle, event, enabled, running);
+ perf_output_read_group(handle, event, enabled, running, from_sample);
else
- perf_output_read_one(handle, event, enabled, running);
+ perf_output_read_one(handle, event, enabled, running, from_sample);
}
void perf_output_sample(struct perf_output_handle *handle,
@@ -7343,7 +7377,7 @@ void perf_output_sample(struct perf_output_handle *handle,
perf_output_put(handle, data->period);
if (sample_type & PERF_SAMPLE_READ)
- perf_output_read(handle, event);
+ perf_output_read(handle, event, true);
if (sample_type & PERF_SAMPLE_CALLCHAIN) {
int size = 1;
@@ -7944,7 +7978,7 @@ perf_event_read_event(struct perf_event *event,
return;
perf_output_put(&handle, read_event);
- perf_output_read(&handle, event);
+ perf_output_read(&handle, event, false);
perf_event__output_id_sample(event, &handle, &sample);
perf_output_end(&handle);
@@ -12006,10 +12040,12 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
local64_set(&hwc->period_left, hwc->sample_period);
/*
- * We currently do not support PERF_SAMPLE_READ on inherited events.
+ * We do not support PERF_SAMPLE_READ on inherited events unless
+ * PERF_SAMPLE_TID is also selected, which allows inherited events to
+ * collect per-thread samples.
* See perf_output_read().
*/
- if (attr->inherit && (attr->sample_type & PERF_SAMPLE_READ))
+ if (has_inherit_and_sample_read(attr) && !(attr->sample_type & PERF_SAMPLE_TID))
goto err_ns;
if (!has_branch_stack(event))
@@ -13033,7 +13069,7 @@ static void sync_child_event(struct perf_event *child_event)
perf_event_read_event(child_event, task);
}
- child_val = perf_event_count(child_event);
+ child_val = perf_event_count_cumulative(child_event);
/*
* Add back the child's count to the parent's count:
--
2.45.1
Sample period calculation is updated to take into account the fact that
events with inherit+PERF_SAMPLE_READ+PERF_SAMPLE_TID should use
the TID in combination with the ID field in the read_format data to
identify which value represents the previous accumulated counter total used to
calculate the period delta since the last sample.
perf_sample_id is modified to support tracking per-thread
values, along with the existing global per-id values. In the
per-thread case, values are stored in a hash by TID within the
perf_sample_id, and are dynamically allocated as the number is not known
ahead of time.
deliver_sample_value is modified to correctly locate the previous
sample storage based on the attribute, stream id and thread id.
Signed-off-by: Ben Gainey <[email protected]>
---
tools/lib/perf/evsel.c | 41 ++++++++++++++++++++++
tools/lib/perf/include/internal/evsel.h | 45 +++++++++++++++++++++++--
tools/perf/util/session.c | 11 ++++--
3 files changed, 92 insertions(+), 5 deletions(-)
diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
index f7abb879f416e..26f3d7ba0f267 100644
--- a/tools/lib/perf/evsel.c
+++ b/tools/lib/perf/evsel.c
@@ -5,6 +5,7 @@
#include <perf/evsel.h>
#include <perf/cpumap.h>
#include <perf/threadmap.h>
+#include <linux/hash.h>
#include <linux/list.h>
#include <internal/evsel.h>
#include <linux/zalloc.h>
@@ -23,6 +24,7 @@ void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr,
int idx)
{
INIT_LIST_HEAD(&evsel->node);
+ INIT_LIST_HEAD(&evsel->per_stream_periods);
evsel->attr = *attr;
evsel->idx = idx;
evsel->leader = evsel;
@@ -531,10 +533,17 @@ int perf_evsel__alloc_id(struct perf_evsel *evsel, int ncpus, int nthreads)
void perf_evsel__free_id(struct perf_evsel *evsel)
{
+ struct perf_sample_id_period *pos, *n;
+
xyarray__delete(evsel->sample_id);
evsel->sample_id = NULL;
zfree(&evsel->id);
evsel->ids = 0;
+
+ perf_evsel_for_each_per_thread_period_safe(evsel, n, pos) {
+ list_del_init(&pos->node);
+ free(pos);
+ }
}
bool perf_evsel__attr_has_per_thread_sample_period(struct perf_evsel *evsel)
@@ -544,6 +553,38 @@ bool perf_evsel__attr_has_per_thread_sample_period(struct perf_evsel *evsel)
&& evsel->attr.inherit;
}
+u64 *perf_sample_id__get_period_storage(struct perf_sample_id *sid, u32 tid)
+{
+ struct hlist_head *head;
+ struct perf_sample_id_period *res;
+ int hash;
+
+ if (!sid->period_per_thread)
+ return &sid->period;
+
+ hash = hash_32(tid, PERF_SAMPLE_ID__HLIST_BITS);
+ head = &sid->periods[hash];
+
+ hlist_for_each_entry(res, head, hnode)
+ if (res->tid == tid)
+ return &res->period;
+
+ if (sid->evsel == NULL)
+ return NULL;
+
+ res = zalloc(sizeof(struct perf_sample_id_period));
+ if (res == NULL)
+ return NULL;
+
+ INIT_LIST_HEAD(&res->node);
+ res->tid = tid;
+
+ list_add_tail(&res->node, &sid->evsel->per_stream_periods);
+ hlist_add_head(&res->hnode, &sid->periods[hash]);
+
+ return &res->period;
+}
+
void perf_counts_values__scale(struct perf_counts_values *count,
bool scale, __s8 *pscaled)
{
diff --git a/tools/lib/perf/include/internal/evsel.h b/tools/lib/perf/include/internal/evsel.h
index f8de2bf89c76b..797dc9d782544 100644
--- a/tools/lib/perf/include/internal/evsel.h
+++ b/tools/lib/perf/include/internal/evsel.h
@@ -11,6 +11,32 @@
struct perf_thread_map;
struct xyarray;
+/**
+ * The per-thread accumulated period storage node.
+ */
+struct perf_sample_id_period {
+ struct list_head node;
+ struct hlist_node hnode;
+ /* Holds total ID period value for PERF_SAMPLE_READ processing. */
+ u64 period;
+ /* The TID that the values belongs to */
+ u32 tid;
+};
+
+/**
+ * perf_evsel_for_each_per_thread_period_safe - safely iterate thru all the
+ * per_stream_periods
+ * @evlist:perf_evsel instance to iterate
+ * @item: struct perf_sample_id_period iterator
+ * @tmp: struct perf_sample_id_period temp iterator
+ */
+#define perf_evsel_for_each_per_thread_period_safe(evsel, tmp, item) \
+ list_for_each_entry_safe(item, tmp, &(evsel)->per_stream_periods, node)
+
+
+#define PERF_SAMPLE_ID__HLIST_BITS 4
+#define PERF_SAMPLE_ID__HLIST_SIZE (1 << PERF_SAMPLE_ID__HLIST_BITS)
+
/*
* Per fd, to map back from PERF_SAMPLE_ID to evsel, only used when there are
* more than one entry in the evlist.
@@ -34,8 +60,18 @@ struct perf_sample_id {
pid_t machine_pid;
struct perf_cpu vcpu;
- /* Holds total ID period value for PERF_SAMPLE_READ processing. */
- u64 period;
+ union {
+ /*
+ * Holds total ID period value for PERF_SAMPLE_READ processing
+ * (when period is not per-thread).
+ */
+ u64 period;
+ /*
+ * Holds total ID period value for PERF_SAMPLE_READ processing
+ * (when period is per-thread).
+ */
+ struct hlist_head periods[PERF_SAMPLE_ID__HLIST_SIZE];
+ };
/*
* When inherit is combined with PERF_SAMPLE_READ, the period value is
@@ -65,6 +101,9 @@ struct perf_evsel {
u32 ids;
struct perf_evsel *leader;
+ /* Where period_per_thread is true, stores the per-thread values */
+ struct list_head per_stream_periods;
+
/* parse modifier helper */
int nr_members;
/*
@@ -97,4 +136,6 @@ void perf_evsel__free_id(struct perf_evsel *evsel);
bool perf_evsel__attr_has_per_thread_sample_period(struct perf_evsel *evsel);
+u64 *perf_sample_id__get_period_storage(struct perf_sample_id *sid, u32 tid);
+
#endif /* __LIBPERF_INTERNAL_EVSEL_H */
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 06d0bd7fb4599..2c66b730780ab 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1478,14 +1478,19 @@ static int deliver_sample_value(struct evlist *evlist,
{
struct perf_sample_id *sid = evlist__id2sid(evlist, v->id);
struct evsel *evsel;
+ u64 *storage = NULL;
if (sid) {
+ storage = perf_sample_id__get_period_storage(sid, sample->tid);
+ }
+
+ if (storage) {
sample->id = v->id;
- sample->period = v->value - sid->period;
- sid->period = v->value;
+ sample->period = v->value - *storage;
+ *storage = v->value;
}
- if (!sid || sid->evsel == NULL) {
+ if (!storage || sid->evsel == NULL) {
++evlist->stats.nr_unknown_id;
return 0;
}
--
2.45.1
The tool will now default to this new mode if the user specifies a
sampling group when not in system-wide mode, and when --no-inherit
is not specified.
This change updates evsel to allow the combination of inherit
and PERF_SAMPLE_READ.
A fallback is implemented for kernel versions where this feature is not
supported.
Signed-off-by: Ben Gainey <[email protected]>
---
tools/perf/tests/attr/README | 2 +
.../tests/attr/test-record-group-sampling | 39 ------------
.../tests/attr/test-record-group-sampling1 | 50 ++++++++++++++++
.../tests/attr/test-record-group-sampling2 | 60 +++++++++++++++++++
tools/perf/tests/attr/test-record-group2 | 9 +--
tools/perf/util/evsel.c | 19 +++++-
tools/perf/util/evsel.h | 1 +
7 files changed, 135 insertions(+), 45 deletions(-)
delete mode 100644 tools/perf/tests/attr/test-record-group-sampling
create mode 100644 tools/perf/tests/attr/test-record-group-sampling1
create mode 100644 tools/perf/tests/attr/test-record-group-sampling2
diff --git a/tools/perf/tests/attr/README b/tools/perf/tests/attr/README
index 4066fec7180a8..67c4ca76b85d5 100644
--- a/tools/perf/tests/attr/README
+++ b/tools/perf/tests/attr/README
@@ -51,6 +51,8 @@ Following tests are defined (with perf commands):
perf record --call-graph fp kill (test-record-graph-fp-aarch64)
perf record -e '{cycles,instructions}' kill (test-record-group1)
perf record -e '{cycles/period=1/,instructions/period=2/}:S' kill (test-record-group2)
+ perf record -e '{cycles,cache-misses}:S' kill (test-record-group-sampling1)
+ perf record -c 10000 -e '{cycles,cache-misses}:S' kill (test-record-group-sampling2)
perf record -D kill (test-record-no-delay)
perf record -i kill (test-record-no-inherit)
perf record -n kill (test-record-no-samples)
diff --git a/tools/perf/tests/attr/test-record-group-sampling b/tools/perf/tests/attr/test-record-group-sampling
deleted file mode 100644
index 97e7e64a38f07..0000000000000
--- a/tools/perf/tests/attr/test-record-group-sampling
+++ /dev/null
@@ -1,39 +0,0 @@
-[config]
-command = record
-args = --no-bpf-event -e '{cycles,cache-misses}:S' kill >/dev/null 2>&1
-ret = 1
-
-[event-1:base-record]
-fd=1
-group_fd=-1
-sample_type=343
-read_format=12|28
-inherit=0
-
-[event-2:base-record]
-fd=2
-group_fd=1
-
-# cache-misses
-type=0
-config=3
-
-# default | PERF_SAMPLE_READ
-sample_type=343
-
-# PERF_FORMAT_ID | PERF_FORMAT_GROUP | PERF_FORMAT_LOST
-read_format=12|28
-task=0
-mmap=0
-comm=0
-enable_on_exec=0
-disabled=0
-
-# inherit is disabled for group sampling
-inherit=0
-
-# sampling disabled
-sample_freq=0
-sample_period=0
-freq=0
-write_backward=0
diff --git a/tools/perf/tests/attr/test-record-group-sampling1 b/tools/perf/tests/attr/test-record-group-sampling1
new file mode 100644
index 0000000000000..9b87306266329
--- /dev/null
+++ b/tools/perf/tests/attr/test-record-group-sampling1
@@ -0,0 +1,50 @@
+[config]
+command = record
+args = --no-bpf-event -e '{cycles,cache-misses}:S' kill >/dev/null 2>&1
+ret = 1
+
+[event-1:base-record]
+fd=1
+group_fd=-1
+
+# cycles
+type=0
+config=0
+
+# default | PERF_SAMPLE_READ
+sample_type=343
+
+# PERF_FORMAT_ID | PERF_FORMAT_GROUP | PERF_FORMAT_LOST | PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING
+read_format=28|31
+task=1
+mmap=1
+comm=1
+enable_on_exec=1
+disabled=1
+
+# inherit is enabled for group sampling
+inherit=1
+
+[event-2:base-record]
+fd=2
+group_fd=1
+
+# cache-misses
+type=0
+config=3
+
+# default | PERF_SAMPLE_READ
+sample_type=343
+
+# PERF_FORMAT_ID | PERF_FORMAT_GROUP | PERF_FORMAT_LOST | PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING
+read_format=28|31
+task=0
+mmap=0
+comm=0
+enable_on_exec=0
+disabled=0
+freq=0
+
+# inherit is enabled for group sampling
+inherit=1
+
diff --git a/tools/perf/tests/attr/test-record-group-sampling2 b/tools/perf/tests/attr/test-record-group-sampling2
new file mode 100644
index 0000000000000..8e29fc13f6668
--- /dev/null
+++ b/tools/perf/tests/attr/test-record-group-sampling2
@@ -0,0 +1,60 @@
+[config]
+command = record
+args = --no-bpf-event -c 10000 -e '{cycles,cache-misses}:S' kill >/dev/null 2>&1
+ret = 1
+
+[event-1:base-record]
+fd=1
+group_fd=-1
+
+# cycles
+type=0
+config=0
+
+# default | PERF_SAMPLE_READ
+sample_type=87
+
+# PERF_FORMAT_ID | PERF_FORMAT_GROUP | PERF_FORMAT_LOST | PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING
+read_format=28|31
+task=1
+mmap=1
+comm=1
+enable_on_exec=1
+disabled=1
+
+# inherit is enabled for group sampling
+inherit=1
+
+# sampling disabled
+sample_freq=0
+sample_period=10000
+freq=0
+write_backward=0
+
+[event-2:base-record]
+fd=2
+group_fd=1
+
+# cache-misses
+type=0
+config=3
+
+# default | PERF_SAMPLE_READ
+sample_type=87
+
+# PERF_FORMAT_ID | PERF_FORMAT_GROUP | PERF_FORMAT_LOST | PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING
+read_format=28|31
+task=0
+mmap=0
+comm=0
+enable_on_exec=0
+disabled=0
+
+# inherit is enabled for group sampling
+inherit=1
+
+# sampling disabled
+sample_freq=0
+sample_period=0
+freq=0
+write_backward=0
diff --git a/tools/perf/tests/attr/test-record-group2 b/tools/perf/tests/attr/test-record-group2
index cebdaa8e64e47..785892a54d9e1 100644
--- a/tools/perf/tests/attr/test-record-group2
+++ b/tools/perf/tests/attr/test-record-group2
@@ -9,8 +9,9 @@ group_fd=-1
config=0|1
sample_period=1234000
sample_type=87
-read_format=12|28
-inherit=0
+read_format=28|31
+disabled=1
+inherit=1
freq=0
[event-2:base-record]
@@ -19,9 +20,9 @@ group_fd=1
config=0|1
sample_period=6789000
sample_type=87
-read_format=12|28
+read_format=28|31
disabled=0
-inherit=0
+inherit=1
mmap=0
comm=0
freq=0
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 3536404e9447b..557d409c53d6c 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1156,7 +1156,15 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
*/
if (leader->core.nr_members > 1) {
attr->read_format |= PERF_FORMAT_GROUP;
- attr->inherit = 0;
+ }
+
+ /*
+ * Inherit + SAMPLE_READ requires SAMPLE_TID in the read_format
+ */
+ if (attr->inherit) {
+ evsel__set_sample_bit(evsel, TID);
+ evsel->core.attr.read_format |=
+ PERF_FORMAT_ID;
}
}
@@ -1832,6 +1840,8 @@ static int __evsel__prepare_open(struct evsel *evsel, struct perf_cpu_map *cpus,
static void evsel__disable_missing_features(struct evsel *evsel)
{
+ if (perf_missing_features.inherit_sample_read)
+ evsel->core.attr.inherit = 0;
if (perf_missing_features.branch_counters)
evsel->core.attr.branch_sample_type &= ~PERF_SAMPLE_BRANCH_COUNTERS;
if (perf_missing_features.read_lost)
@@ -1887,7 +1897,12 @@ bool evsel__detect_missing_features(struct evsel *evsel)
* Must probe features in the order they were added to the
* perf_event_attr interface.
*/
- if (!perf_missing_features.branch_counters &&
+ if (!perf_missing_features.inherit_sample_read &&
+ evsel->core.attr.inherit && (evsel->core.attr.sample_type & PERF_SAMPLE_READ)) {
+ perf_missing_features.inherit_sample_read = true;
+ pr_debug2("Using PERF_SAMPLE_READ / :S modifier is not compatible with inherit, falling back to no-inherit.\n");
+ return true;
+ } else if (!perf_missing_features.branch_counters &&
(evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS)) {
perf_missing_features.branch_counters = true;
pr_debug2("switching off branch counters support\n");
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 517cff431de20..21b8b7e70e75e 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -192,6 +192,7 @@ struct perf_missing_features {
bool weight_struct;
bool read_lost;
bool branch_counters;
+ bool inherit_sample_read;
};
extern struct perf_missing_features perf_missing_features;
--
2.45.1
Hello,
On Tue, May 21, 2024 at 6:30 AM Ben Gainey <[email protected]> wrote:
>
> This change allows events to use PERF_SAMPLE READ with inherit
> so long as PERF_SAMPLE_TID is also set.
>
> In this configuration, an event will be inherited into any
> child processes / threads, allowing convenient profiling of a
> multiprocess or multithreaded application, whilst allowing
> profiling tools to collect per-thread samples, in particular
> of groups of counters.
>
> The read_format field of both PERF_RECORD_READ and PERF_RECORD_SAMPLE
> are changed by this new configuration, but calls to `read()` on the same
> event file descriptor are unaffected and continue to return the
> cumulative total.
>
> Signed-off-by: Ben Gainey <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Thanks,
Namhyung
> ---
> include/linux/perf_event.h | 1 +
> kernel/events/core.c | 78 ++++++++++++++++++++++++++++----------
> 2 files changed, 58 insertions(+), 21 deletions(-)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index d2a15c0c6f8a9..e7eed33c50f16 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -932,6 +932,7 @@ struct perf_event_context {
>
> int nr_task_data;
> int nr_stat;
> + int nr_inherit_read;
> int nr_freq;
> int rotate_disable;
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 724e6d7e128f3..e7c847956ebff 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -1767,6 +1767,14 @@ perf_event_groups_next(struct perf_event *event, struct pmu *pmu)
> event = rb_entry_safe(rb_next(&event->group_node), \
> typeof(*event), group_node))
>
> +/*
> + * Does the event attribute request inherit with PERF_SAMPLE_READ
> + */
> +static inline bool has_inherit_and_sample_read(struct perf_event_attr *attr)
> +{
> + return attr->inherit && (attr->sample_type & PERF_SAMPLE_READ);
> +}
> +
> /*
> * Add an event from the lists for its context.
> * Must be called with ctx->mutex and ctx->lock held.
> @@ -1797,6 +1805,8 @@ list_add_event(struct perf_event *event, struct perf_event_context *ctx)
> ctx->nr_user++;
> if (event->attr.inherit_stat)
> ctx->nr_stat++;
> + if (has_inherit_and_sample_read(&event->attr))
> + ctx->nr_inherit_read++;
>
> if (event->state > PERF_EVENT_STATE_OFF)
> perf_cgroup_event_enable(event, ctx);
> @@ -2021,6 +2031,8 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx)
> ctx->nr_user--;
> if (event->attr.inherit_stat)
> ctx->nr_stat--;
> + if (has_inherit_and_sample_read(&event->attr))
> + ctx->nr_inherit_read--;
>
> list_del_rcu(&event->event_entry);
>
> @@ -3529,11 +3541,18 @@ perf_event_context_sched_out(struct task_struct *task, struct task_struct *next)
> perf_ctx_disable(ctx, false);
>
> /* PMIs are disabled; ctx->nr_pending is stable. */
> - if (local_read(&ctx->nr_pending) ||
> + if (ctx->nr_inherit_read ||
> + next_ctx->nr_inherit_read ||
> + local_read(&ctx->nr_pending) ||
> local_read(&next_ctx->nr_pending)) {
> /*
> * Must not swap out ctx when there's pending
> * events that rely on the ctx->task relation.
> + *
> + * Likewise, when a context contains inherit +
> + * SAMPLE_READ events they should be switched
> + * out using the slow path so that they are
> + * treated as if they were distinct contexts.
> */
> raw_spin_unlock(&next_ctx->lock);
> rcu_read_unlock();
> @@ -4533,11 +4552,19 @@ static void __perf_event_read(void *info)
> raw_spin_unlock(&ctx->lock);
> }
>
> -static inline u64 perf_event_count(struct perf_event *event)
> +static inline u64 perf_event_count_cumulative(struct perf_event *event)
> {
> return local64_read(&event->count) + atomic64_read(&event->child_count);
> }
>
> +static inline u64 perf_event_count(struct perf_event *event, bool self_value_only)
> +{
> + if (self_value_only && has_inherit_and_sample_read(&event->attr))
> + return local64_read(&event->count);
> +
> + return perf_event_count_cumulative(event);
> +}
> +
> static void calc_timer_values(struct perf_event *event,
> u64 *now,
> u64 *enabled,
> @@ -5454,7 +5481,7 @@ static u64 __perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *
> mutex_lock(&event->child_mutex);
>
> (void)perf_event_read(event, false);
> - total += perf_event_count(event);
> + total += perf_event_count_cumulative(event);
>
> *enabled += event->total_time_enabled +
> atomic64_read(&event->child_total_time_enabled);
> @@ -5463,7 +5490,7 @@ static u64 __perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *
>
> list_for_each_entry(child, &event->child_list, child_list) {
> (void)perf_event_read(child, false);
> - total += perf_event_count(child);
> + total += perf_event_count_cumulative(child);
> *enabled += child->total_time_enabled;
> *running += child->total_time_running;
> }
> @@ -5545,14 +5572,14 @@ static int __perf_read_group_add(struct perf_event *leader,
> /*
> * Write {count,id} tuples for every sibling.
> */
> - values[n++] += perf_event_count(leader);
> + values[n++] += perf_event_count_cumulative(leader);
> if (read_format & PERF_FORMAT_ID)
> values[n++] = primary_event_id(leader);
> if (read_format & PERF_FORMAT_LOST)
> values[n++] = atomic64_read(&leader->lost_samples);
>
> for_each_sibling_event(sub, leader) {
> - values[n++] += perf_event_count(sub);
> + values[n++] += perf_event_count_cumulative(sub);
> if (read_format & PERF_FORMAT_ID)
> values[n++] = primary_event_id(sub);
> if (read_format & PERF_FORMAT_LOST)
> @@ -6132,7 +6159,7 @@ void perf_event_update_userpage(struct perf_event *event)
> ++userpg->lock;
> barrier();
> userpg->index = perf_event_index(event);
> - userpg->offset = perf_event_count(event);
> + userpg->offset = perf_event_count_cumulative(event);
> if (userpg->index)
> userpg->offset -= local64_read(&event->hw.prev_count);
>
> @@ -7194,13 +7221,14 @@ void perf_event__output_id_sample(struct perf_event *event,
>
> static void perf_output_read_one(struct perf_output_handle *handle,
> struct perf_event *event,
> - u64 enabled, u64 running)
> + u64 enabled, u64 running,
> + bool from_sample)
> {
> u64 read_format = event->attr.read_format;
> u64 values[5];
> int n = 0;
>
> - values[n++] = perf_event_count(event);
> + values[n++] = perf_event_count(event, from_sample);
> if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED) {
> values[n++] = enabled +
> atomic64_read(&event->child_total_time_enabled);
> @@ -7218,8 +7246,9 @@ static void perf_output_read_one(struct perf_output_handle *handle,
> }
>
> static void perf_output_read_group(struct perf_output_handle *handle,
> - struct perf_event *event,
> - u64 enabled, u64 running)
> + struct perf_event *event,
> + u64 enabled, u64 running,
> + bool from_sample)
> {
> struct perf_event *leader = event->group_leader, *sub;
> u64 read_format = event->attr.read_format;
> @@ -7245,7 +7274,7 @@ static void perf_output_read_group(struct perf_output_handle *handle,
> (leader->state == PERF_EVENT_STATE_ACTIVE))
> leader->pmu->read(leader);
>
> - values[n++] = perf_event_count(leader);
> + values[n++] = perf_event_count(leader, from_sample);
> if (read_format & PERF_FORMAT_ID)
> values[n++] = primary_event_id(leader);
> if (read_format & PERF_FORMAT_LOST)
> @@ -7260,7 +7289,7 @@ static void perf_output_read_group(struct perf_output_handle *handle,
> (sub->state == PERF_EVENT_STATE_ACTIVE))
> sub->pmu->read(sub);
>
> - values[n++] = perf_event_count(sub);
> + values[n++] = perf_event_count(sub, from_sample);
> if (read_format & PERF_FORMAT_ID)
> values[n++] = primary_event_id(sub);
> if (read_format & PERF_FORMAT_LOST)
> @@ -7281,9 +7310,14 @@ static void perf_output_read_group(struct perf_output_handle *handle,
> * The problem is that its both hard and excessively expensive to iterate the
> * child list, not to mention that its impossible to IPI the children running
> * on another CPU, from interrupt/NMI context.
> + *
> + * Instead the combination of PERF_SAMPLE_READ and inherit will track per-thread
> + * counts rather than attempting to accumulate some value across all children on
> + * all cores.
> */
> static void perf_output_read(struct perf_output_handle *handle,
> - struct perf_event *event)
> + struct perf_event *event,
> + bool from_sample)
> {
> u64 enabled = 0, running = 0, now;
> u64 read_format = event->attr.read_format;
> @@ -7301,9 +7335,9 @@ static void perf_output_read(struct perf_output_handle *handle,
> calc_timer_values(event, &now, &enabled, &running);
>
> if (event->attr.read_format & PERF_FORMAT_GROUP)
> - perf_output_read_group(handle, event, enabled, running);
> + perf_output_read_group(handle, event, enabled, running, from_sample);
> else
> - perf_output_read_one(handle, event, enabled, running);
> + perf_output_read_one(handle, event, enabled, running, from_sample);
> }
>
> void perf_output_sample(struct perf_output_handle *handle,
> @@ -7343,7 +7377,7 @@ void perf_output_sample(struct perf_output_handle *handle,
> perf_output_put(handle, data->period);
>
> if (sample_type & PERF_SAMPLE_READ)
> - perf_output_read(handle, event);
> + perf_output_read(handle, event, true);
>
> if (sample_type & PERF_SAMPLE_CALLCHAIN) {
> int size = 1;
> @@ -7944,7 +7978,7 @@ perf_event_read_event(struct perf_event *event,
> return;
>
> perf_output_put(&handle, read_event);
> - perf_output_read(&handle, event);
> + perf_output_read(&handle, event, false);
> perf_event__output_id_sample(event, &handle, &sample);
>
> perf_output_end(&handle);
> @@ -12006,10 +12040,12 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
> local64_set(&hwc->period_left, hwc->sample_period);
>
> /*
> - * We currently do not support PERF_SAMPLE_READ on inherited events.
> + * We do not support PERF_SAMPLE_READ on inherited events unless
> + * PERF_SAMPLE_TID is also selected, which allows inherited events to
> + * collect per-thread samples.
> * See perf_output_read().
> */
> - if (attr->inherit && (attr->sample_type & PERF_SAMPLE_READ))
> + if (has_inherit_and_sample_read(attr) && !(attr->sample_type & PERF_SAMPLE_TID))
> goto err_ns;
>
> if (!has_branch_stack(event))
> @@ -13033,7 +13069,7 @@ static void sync_child_event(struct perf_event *child_event)
> perf_event_read_event(child_event, task);
> }
>
> - child_val = perf_event_count(child_event);
> + child_val = perf_event_count_cumulative(child_event);
>
> /*
> * Add back the child's count to the parent's count:
> --
> 2.45.1
>
On Tue, May 21, 2024 at 6:30 AM Ben Gainey <[email protected]> wrote:
>
> The tool will now default to this new mode if the user specifies a
> sampling group when not in system-wide mode, and when --no-inherit
> is not specified.
>
> This change updates evsel to allow the combination of inherit
> and PERF_SAMPLE_READ.
>
> A fallback is implemented for kernel versions where this feature is not
> supported.
But I'm afraid the test would fail on old kernels. Maybe we need to
put it in the selftests.
Thanks,
Namhyung
>
> Signed-off-by: Ben Gainey <[email protected]>
> ---
> tools/perf/tests/attr/README | 2 +
> .../tests/attr/test-record-group-sampling | 39 ------------
> .../tests/attr/test-record-group-sampling1 | 50 ++++++++++++++++
> .../tests/attr/test-record-group-sampling2 | 60 +++++++++++++++++++
> tools/perf/tests/attr/test-record-group2 | 9 +--
> tools/perf/util/evsel.c | 19 +++++-
> tools/perf/util/evsel.h | 1 +
> 7 files changed, 135 insertions(+), 45 deletions(-)
> delete mode 100644 tools/perf/tests/attr/test-record-group-sampling
> create mode 100644 tools/perf/tests/attr/test-record-group-sampling1
> create mode 100644 tools/perf/tests/attr/test-record-group-sampling2
>
> diff --git a/tools/perf/tests/attr/README b/tools/perf/tests/attr/README
> index 4066fec7180a8..67c4ca76b85d5 100644
> --- a/tools/perf/tests/attr/README
> +++ b/tools/perf/tests/attr/README
> @@ -51,6 +51,8 @@ Following tests are defined (with perf commands):
> perf record --call-graph fp kill (test-record-graph-fp-aarch64)
> perf record -e '{cycles,instructions}' kill (test-record-group1)
> perf record -e '{cycles/period=1/,instructions/period=2/}:S' kill (test-record-group2)
> + perf record -e '{cycles,cache-misses}:S' kill (test-record-group-sampling1)
> + perf record -c 10000 -e '{cycles,cache-misses}:S' kill (test-record-group-sampling2)
> perf record -D kill (test-record-no-delay)
> perf record -i kill (test-record-no-inherit)
> perf record -n kill (test-record-no-samples)
> diff --git a/tools/perf/tests/attr/test-record-group-sampling b/tools/perf/tests/attr/test-record-group-sampling
> deleted file mode 100644
> index 97e7e64a38f07..0000000000000
> --- a/tools/perf/tests/attr/test-record-group-sampling
> +++ /dev/null
> @@ -1,39 +0,0 @@
> -[config]
> -command = record
> -args = --no-bpf-event -e '{cycles,cache-misses}:S' kill >/dev/null 2>&1
> -ret = 1
> -
> -[event-1:base-record]
> -fd=1
> -group_fd=-1
> -sample_type=343
> -read_format=12|28
> -inherit=0
> -
> -[event-2:base-record]
> -fd=2
> -group_fd=1
> -
> -# cache-misses
> -type=0
> -config=3
> -
> -# default | PERF_SAMPLE_READ
> -sample_type=343
> -
> -# PERF_FORMAT_ID | PERF_FORMAT_GROUP | PERF_FORMAT_LOST
> -read_format=12|28
> -task=0
> -mmap=0
> -comm=0
> -enable_on_exec=0
> -disabled=0
> -
> -# inherit is disabled for group sampling
> -inherit=0
> -
> -# sampling disabled
> -sample_freq=0
> -sample_period=0
> -freq=0
> -write_backward=0
> diff --git a/tools/perf/tests/attr/test-record-group-sampling1 b/tools/perf/tests/attr/test-record-group-sampling1
> new file mode 100644
> index 0000000000000..9b87306266329
> --- /dev/null
> +++ b/tools/perf/tests/attr/test-record-group-sampling1
> @@ -0,0 +1,50 @@
> +[config]
> +command = record
> +args = --no-bpf-event -e '{cycles,cache-misses}:S' kill >/dev/null 2>&1
> +ret = 1
> +
> +[event-1:base-record]
> +fd=1
> +group_fd=-1
> +
> +# cycles
> +type=0
> +config=0
> +
> +# default | PERF_SAMPLE_READ
> +sample_type=343
> +
> +# PERF_FORMAT_ID | PERF_FORMAT_GROUP | PERF_FORMAT_LOST | PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING
> +read_format=28|31
> +task=1
> +mmap=1
> +comm=1
> +enable_on_exec=1
> +disabled=1
> +
> +# inherit is enabled for group sampling
> +inherit=1
> +
> +[event-2:base-record]
> +fd=2
> +group_fd=1
> +
> +# cache-misses
> +type=0
> +config=3
> +
> +# default | PERF_SAMPLE_READ
> +sample_type=343
> +
> +# PERF_FORMAT_ID | PERF_FORMAT_GROUP | PERF_FORMAT_LOST | PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING
> +read_format=28|31
> +task=0
> +mmap=0
> +comm=0
> +enable_on_exec=0
> +disabled=0
> +freq=0
> +
> +# inherit is enabled for group sampling
> +inherit=1
> +
> diff --git a/tools/perf/tests/attr/test-record-group-sampling2 b/tools/perf/tests/attr/test-record-group-sampling2
> new file mode 100644
> index 0000000000000..8e29fc13f6668
> --- /dev/null
> +++ b/tools/perf/tests/attr/test-record-group-sampling2
> @@ -0,0 +1,60 @@
> +[config]
> +command = record
> +args = --no-bpf-event -c 10000 -e '{cycles,cache-misses}:S' kill >/dev/null 2>&1
> +ret = 1
> +
> +[event-1:base-record]
> +fd=1
> +group_fd=-1
> +
> +# cycles
> +type=0
> +config=0
> +
> +# default | PERF_SAMPLE_READ
> +sample_type=87
> +
> +# PERF_FORMAT_ID | PERF_FORMAT_GROUP | PERF_FORMAT_LOST | PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING
> +read_format=28|31
> +task=1
> +mmap=1
> +comm=1
> +enable_on_exec=1
> +disabled=1
> +
> +# inherit is enabled for group sampling
> +inherit=1
> +
> +# sampling disabled
> +sample_freq=0
> +sample_period=10000
> +freq=0
> +write_backward=0
> +
> +[event-2:base-record]
> +fd=2
> +group_fd=1
> +
> +# cache-misses
> +type=0
> +config=3
> +
> +# default | PERF_SAMPLE_READ
> +sample_type=87
> +
> +# PERF_FORMAT_ID | PERF_FORMAT_GROUP | PERF_FORMAT_LOST | PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING
> +read_format=28|31
> +task=0
> +mmap=0
> +comm=0
> +enable_on_exec=0
> +disabled=0
> +
> +# inherit is enabled for group sampling
> +inherit=1
> +
> +# sampling disabled
> +sample_freq=0
> +sample_period=0
> +freq=0
> +write_backward=0
> diff --git a/tools/perf/tests/attr/test-record-group2 b/tools/perf/tests/attr/test-record-group2
> index cebdaa8e64e47..785892a54d9e1 100644
> --- a/tools/perf/tests/attr/test-record-group2
> +++ b/tools/perf/tests/attr/test-record-group2
> @@ -9,8 +9,9 @@ group_fd=-1
> config=0|1
> sample_period=1234000
> sample_type=87
> -read_format=12|28
> -inherit=0
> +read_format=28|31
> +disabled=1
> +inherit=1
> freq=0
>
> [event-2:base-record]
> @@ -19,9 +20,9 @@ group_fd=1
> config=0|1
> sample_period=6789000
> sample_type=87
> -read_format=12|28
> +read_format=28|31
> disabled=0
> -inherit=0
> +inherit=1
> mmap=0
> comm=0
> freq=0
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 3536404e9447b..557d409c53d6c 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -1156,7 +1156,15 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
> */
> if (leader->core.nr_members > 1) {
> attr->read_format |= PERF_FORMAT_GROUP;
> - attr->inherit = 0;
> + }
> +
> + /*
> + * Inherit + SAMPLE_READ requires SAMPLE_TID in the read_format
> + */
> + if (attr->inherit) {
> + evsel__set_sample_bit(evsel, TID);
> + evsel->core.attr.read_format |=
> + PERF_FORMAT_ID;
> }
> }
>
> @@ -1832,6 +1840,8 @@ static int __evsel__prepare_open(struct evsel *evsel, struct perf_cpu_map *cpus,
>
> static void evsel__disable_missing_features(struct evsel *evsel)
> {
> + if (perf_missing_features.inherit_sample_read)
> + evsel->core.attr.inherit = 0;
> if (perf_missing_features.branch_counters)
> evsel->core.attr.branch_sample_type &= ~PERF_SAMPLE_BRANCH_COUNTERS;
> if (perf_missing_features.read_lost)
> @@ -1887,7 +1897,12 @@ bool evsel__detect_missing_features(struct evsel *evsel)
> * Must probe features in the order they were added to the
> * perf_event_attr interface.
> */
> - if (!perf_missing_features.branch_counters &&
> + if (!perf_missing_features.inherit_sample_read &&
> + evsel->core.attr.inherit && (evsel->core.attr.sample_type & PERF_SAMPLE_READ)) {
> + perf_missing_features.inherit_sample_read = true;
> + pr_debug2("Using PERF_SAMPLE_READ / :S modifier is not compatible with inherit, falling back to no-inherit.\n");
> + return true;
> + } else if (!perf_missing_features.branch_counters &&
> (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS)) {
> perf_missing_features.branch_counters = true;
> pr_debug2("switching off branch counters support\n");
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index 517cff431de20..21b8b7e70e75e 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -192,6 +192,7 @@ struct perf_missing_features {
> bool weight_struct;
> bool read_lost;
> bool branch_counters;
> + bool inherit_sample_read;
> };
>
> extern struct perf_missing_features perf_missing_features;
> --
> 2.45.1
>
On Thu, 2024-05-23 at 18:41 -0700, Namhyung Kim wrote:
> On Tue, May 21, 2024 at 6:30 AM Ben Gainey <[email protected]>
> wrote:
> >
> > The tool will now default to this new mode if the user specifies a
> > sampling group when not in system-wide mode, and when --no-inherit
> > is not specified.
> >
> > This change updates evsel to allow the combination of inherit
> > and PERF_SAMPLE_READ.
> >
> > A fallback is implemented for kernel versions where this feature is
> > not
> > supported.
>
> But I'm afraid the test would fail on old kernels. Maybe we need to
> put it in the selftests.
>
Sorry, not clear what you mean?
Is the issue that the fallback on older kernels fails, or that the
"perf test" tests fail?
Thanks
Ben
> Thanks,
> Namhyung
>
> >
> > Signed-off-by: Ben Gainey <[email protected]>
> > ---
> > tools/perf/tests/attr/README | 2 +
> > .../tests/attr/test-record-group-sampling | 39 ------------
> > .../tests/attr/test-record-group-sampling1 | 50
> > ++++++++++++++++
> > .../tests/attr/test-record-group-sampling2 | 60
> > +++++++++++++++++++
> > tools/perf/tests/attr/test-record-group2 | 9 +--
> > tools/perf/util/evsel.c | 19 +++++-
> > tools/perf/util/evsel.h | 1 +
> > 7 files changed, 135 insertions(+), 45 deletions(-)
> > delete mode 100644 tools/perf/tests/attr/test-record-group-
> > sampling
> > create mode 100644 tools/perf/tests/attr/test-record-group-
> > sampling1
> > create mode 100644 tools/perf/tests/attr/test-record-group-
> > sampling2
> >
> > diff --git a/tools/perf/tests/attr/README
> > b/tools/perf/tests/attr/README
> > index 4066fec7180a8..67c4ca76b85d5 100644
> > --- a/tools/perf/tests/attr/README
> > +++ b/tools/perf/tests/attr/README
> > @@ -51,6 +51,8 @@ Following tests are defined (with perf commands):
> > perf record --call-graph fp kill (test-record-
> > graph-fp-aarch64)
> > perf record -e '{cycles,instructions}' kill (test-record-
> > group1)
> > perf record -e '{cycles/period=1/,instructions/period=2/}:S'
> > kill (test-record-group2)
> > + perf record -e '{cycles,cache-misses}:S' kill (test-record-
> > group-sampling1)
> > + perf record -c 10000 -e '{cycles,cache-misses}:S' kill (test-
> > record-group-sampling2)
> > perf record -D kill (test-record-no-
> > delay)
> > perf record -i kill (test-record-no-
> > inherit)
> > perf record -n kill (test-record-no-
> > samples)
> > diff --git a/tools/perf/tests/attr/test-record-group-sampling
> > b/tools/perf/tests/attr/test-record-group-sampling
> > deleted file mode 100644
> > index 97e7e64a38f07..0000000000000
> > --- a/tools/perf/tests/attr/test-record-group-sampling
> > +++ /dev/null
> > @@ -1,39 +0,0 @@
> > -[config]
> > -command = record
> > -args = --no-bpf-event -e '{cycles,cache-misses}:S' kill
> > >/dev/null 2>&1
> > -ret = 1
> > -
> > -[event-1:base-record]
> > -fd=1
> > -group_fd=-1
> > -sample_type=343
> > -read_format=12|28
> > -inherit=0
> > -
> > -[event-2:base-record]
> > -fd=2
> > -group_fd=1
> > -
> > -# cache-misses
> > -type=0
> > -config=3
> > -
> > -# default | PERF_SAMPLE_READ
> > -sample_type=343
> > -
> > -# PERF_FORMAT_ID | PERF_FORMAT_GROUP | PERF_FORMAT_LOST
> > -read_format=12|28
> > -task=0
> > -mmap=0
> > -comm=0
> > -enable_on_exec=0
> > -disabled=0
> > -
> > -# inherit is disabled for group sampling
> > -inherit=0
> > -
> > -# sampling disabled
> > -sample_freq=0
> > -sample_period=0
> > -freq=0
> > -write_backward=0
> > diff --git a/tools/perf/tests/attr/test-record-group-sampling1
> > b/tools/perf/tests/attr/test-record-group-sampling1
> > new file mode 100644
> > index 0000000000000..9b87306266329
> > --- /dev/null
> > +++ b/tools/perf/tests/attr/test-record-group-sampling1
> > @@ -0,0 +1,50 @@
> > +[config]
> > +command = record
> > +args = --no-bpf-event -e '{cycles,cache-misses}:S' kill
> > >/dev/null 2>&1
> > +ret = 1
> > +
> > +[event-1:base-record]
> > +fd=1
> > +group_fd=-1
> > +
> > +# cycles
> > +type=0
> > +config=0
> > +
> > +# default | PERF_SAMPLE_READ
> > +sample_type=343
> > +
> > +# PERF_FORMAT_ID | PERF_FORMAT_GROUP | PERF_FORMAT_LOST |
> > PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING
> > +read_format=28|31
> > +task=1
> > +mmap=1
> > +comm=1
> > +enable_on_exec=1
> > +disabled=1
> > +
> > +# inherit is enabled for group sampling
> > +inherit=1
> > +
> > +[event-2:base-record]
> > +fd=2
> > +group_fd=1
> > +
> > +# cache-misses
> > +type=0
> > +config=3
> > +
> > +# default | PERF_SAMPLE_READ
> > +sample_type=343
> > +
> > +# PERF_FORMAT_ID | PERF_FORMAT_GROUP | PERF_FORMAT_LOST |
> > PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING
> > +read_format=28|31
> > +task=0
> > +mmap=0
> > +comm=0
> > +enable_on_exec=0
> > +disabled=0
> > +freq=0
> > +
> > +# inherit is enabled for group sampling
> > +inherit=1
> > +
> > diff --git a/tools/perf/tests/attr/test-record-group-sampling2
> > b/tools/perf/tests/attr/test-record-group-sampling2
> > new file mode 100644
> > index 0000000000000..8e29fc13f6668
> > --- /dev/null
> > +++ b/tools/perf/tests/attr/test-record-group-sampling2
> > @@ -0,0 +1,60 @@
> > +[config]
> > +command = record
> > +args = --no-bpf-event -c 10000 -e '{cycles,cache-misses}:S'
> > kill >/dev/null 2>&1
> > +ret = 1
> > +
> > +[event-1:base-record]
> > +fd=1
> > +group_fd=-1
> > +
> > +# cycles
> > +type=0
> > +config=0
> > +
> > +# default | PERF_SAMPLE_READ
> > +sample_type=87
> > +
> > +# PERF_FORMAT_ID | PERF_FORMAT_GROUP | PERF_FORMAT_LOST |
> > PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING
> > +read_format=28|31
> > +task=1
> > +mmap=1
> > +comm=1
> > +enable_on_exec=1
> > +disabled=1
> > +
> > +# inherit is enabled for group sampling
> > +inherit=1
> > +
> > +# sampling disabled
> > +sample_freq=0
> > +sample_period=10000
> > +freq=0
> > +write_backward=0
> > +
> > +[event-2:base-record]
> > +fd=2
> > +group_fd=1
> > +
> > +# cache-misses
> > +type=0
> > +config=3
> > +
> > +# default | PERF_SAMPLE_READ
> > +sample_type=87
> > +
> > +# PERF_FORMAT_ID | PERF_FORMAT_GROUP | PERF_FORMAT_LOST |
> > PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING
> > +read_format=28|31
> > +task=0
> > +mmap=0
> > +comm=0
> > +enable_on_exec=0
> > +disabled=0
> > +
> > +# inherit is enabled for group sampling
> > +inherit=1
> > +
> > +# sampling disabled
> > +sample_freq=0
> > +sample_period=0
> > +freq=0
> > +write_backward=0
> > diff --git a/tools/perf/tests/attr/test-record-group2
> > b/tools/perf/tests/attr/test-record-group2
> > index cebdaa8e64e47..785892a54d9e1 100644
> > --- a/tools/perf/tests/attr/test-record-group2
> > +++ b/tools/perf/tests/attr/test-record-group2
> > @@ -9,8 +9,9 @@ group_fd=-1
> > config=0|1
> > sample_period=1234000
> > sample_type=87
> > -read_format=12|28
> > -inherit=0
> > +read_format=28|31
> > +disabled=1
> > +inherit=1
> > freq=0
> >
> > [event-2:base-record]
> > @@ -19,9 +20,9 @@ group_fd=1
> > config=0|1
> > sample_period=6789000
> > sample_type=87
> > -read_format=12|28
> > +read_format=28|31
> > disabled=0
> > -inherit=0
> > +inherit=1
> > mmap=0
> > comm=0
> > freq=0
> > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > index 3536404e9447b..557d409c53d6c 100644
> > --- a/tools/perf/util/evsel.c
> > +++ b/tools/perf/util/evsel.c
> > @@ -1156,7 +1156,15 @@ void evsel__config(struct evsel *evsel,
> > struct record_opts *opts,
> > */
> > if (leader->core.nr_members > 1) {
> > attr->read_format |= PERF_FORMAT_GROUP;
> > - attr->inherit = 0;
> > + }
> > +
> > + /*
> > + * Inherit + SAMPLE_READ requires SAMPLE_TID in the
> > read_format
> > + */
> > + if (attr->inherit) {
> > + evsel__set_sample_bit(evsel, TID);
> > + evsel->core.attr.read_format |=
> > + PERF_FORMAT_ID;
> > }
> > }
> >
> > @@ -1832,6 +1840,8 @@ static int __evsel__prepare_open(struct evsel
> > *evsel, struct perf_cpu_map *cpus,
> >
> > static void evsel__disable_missing_features(struct evsel *evsel)
> > {
> > + if (perf_missing_features.inherit_sample_read)
> > + evsel->core.attr.inherit = 0;
> > if (perf_missing_features.branch_counters)
> > evsel->core.attr.branch_sample_type &=
> > ~PERF_SAMPLE_BRANCH_COUNTERS;
> > if (perf_missing_features.read_lost)
> > @@ -1887,7 +1897,12 @@ bool evsel__detect_missing_features(struct
> > evsel *evsel)
> > * Must probe features in the order they were added to the
> > * perf_event_attr interface.
> > */
> > - if (!perf_missing_features.branch_counters &&
> > + if (!perf_missing_features.inherit_sample_read &&
> > + evsel->core.attr.inherit && (evsel-
> > >core.attr.sample_type & PERF_SAMPLE_READ)) {
> > + perf_missing_features.inherit_sample_read = true;
> > + pr_debug2("Using PERF_SAMPLE_READ / :S modifier is
> > not compatible with inherit, falling back to no-inherit.\n");
> > + return true;
> > + } else if (!perf_missing_features.branch_counters &&
> > (evsel->core.attr.branch_sample_type &
> > PERF_SAMPLE_BRANCH_COUNTERS)) {
> > perf_missing_features.branch_counters = true;
> > pr_debug2("switching off branch counters
> > support\n");
> > diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> > index 517cff431de20..21b8b7e70e75e 100644
> > --- a/tools/perf/util/evsel.h
> > +++ b/tools/perf/util/evsel.h
> > @@ -192,6 +192,7 @@ struct perf_missing_features {
> > bool weight_struct;
> > bool read_lost;
> > bool branch_counters;
> > + bool inherit_sample_read;
> > };
> >
> > extern struct perf_missing_features perf_missing_features;
> > --
> > 2.45.1
> >
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Mon, 2024-05-27 at 18:47 +0100, Ben Gainey wrote:
> On Thu, 2024-05-23 at 18:41 -0700, Namhyung Kim wrote:
> > On Tue, May 21, 2024 at 6:30 AM Ben Gainey <[email protected]>
> > wrote:
> > >
> > > The tool will now default to this new mode if the user specifies
> > > a
> > > sampling group when not in system-wide mode, and when --no-
> > > inherit
> > > is not specified.
> > >
> > > This change updates evsel to allow the combination of inherit
> > > and PERF_SAMPLE_READ.
> > >
> > > A fallback is implemented for kernel versions where this feature
> > > is
> > > not
> > > supported.
> >
> > But I'm afraid the test would fail on old kernels. Maybe we need
> > to
> > put it in the selftests.
> >
>
> Sorry, not clear what you mean?
>
> Is the issue that the fallback on older kernels fails, or that the
> "perf test" tests fail?
>
> Thanks
> Ben
Just to follow up, I've rechecked the fallback on an unmodified 6.9.1
kernel with the following:
perf record -vv -e '{cycles,instructions}:S' ls
With an unpatched version of perf running on an unpatched kernel, the
cycles event is opened as:
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|READ|ID|PERIOD
read_format ID|GROUP|LOST
disabled 1
exclude_kernel 1
exclude_hv 1
mmap 1
comm 1
freq 1
enable_on_exec 1
task 1
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
ksymbol 1
bpf_event 1
------------------------------------------------------------
whereas with these patches applied to perf, on an unpatched kernel, the
output is as follows
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|READ|ID|PERIOD
read_format ID|GROUP|LOST
disabled 1
inherit 1
exclude_kernel 1
exclude_hv 1
mmap 1
comm 1
freq 1
enable_on_exec 1
task 1
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
ksymbol 1
bpf_event 1
------------------------------------------------------------
sys_perf_event_open: pid 3442954 cpu 0 group_fd -1 flags 0x8
sys_perf_event_open failed, error -22
Using PERF_SAMPLE_READ / :S modifier is not compatible with
inherit, falling back to no-inherit.
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|READ|ID|PERIOD
read_format ID|GROUP|LOST
disabled 1
exclude_kernel 1
exclude_hv 1
mmap 1
comm 1
freq 1
enable_on_exec 1
task 1
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
ksymbol 1
bpf_event 1
------------------------------------------------------------
The command falls back to the same configuration as was previously
used. The same is true for the instructions event.
`perf test` fails on an unpatched kernel in "15: Setup struct
perf_event_attr" for the test "test-record-group-sampling1" but that is
surely expected for unpatched kernels?
Is there some very-old kernel version where this would be expected to
succeed by accident?
Thanks
Ben
>
>
>
> > Thanks,
> > Namhyung
> >
> > >
> > > Signed-off-by: Ben Gainey <[email protected]>
> > > ---
> > > tools/perf/tests/attr/README | 2 +
> > > .../tests/attr/test-record-group-sampling | 39 ------------
> > > .../tests/attr/test-record-group-sampling1 | 50
> > > ++++++++++++++++
> > > .../tests/attr/test-record-group-sampling2 | 60
> > > +++++++++++++++++++
> > > tools/perf/tests/attr/test-record-group2 | 9 +--
> > > tools/perf/util/evsel.c | 19 +++++-
> > > tools/perf/util/evsel.h | 1 +
> > > 7 files changed, 135 insertions(+), 45 deletions(-)
> > > delete mode 100644 tools/perf/tests/attr/test-record-group-
> > > sampling
> > > create mode 100644 tools/perf/tests/attr/test-record-group-
> > > sampling1
> > > create mode 100644 tools/perf/tests/attr/test-record-group-
> > > sampling2
> > >
> > > diff --git a/tools/perf/tests/attr/README
> > > b/tools/perf/tests/attr/README
> > > index 4066fec7180a8..67c4ca76b85d5 100644
> > > --- a/tools/perf/tests/attr/README
> > > +++ b/tools/perf/tests/attr/README
> > > @@ -51,6 +51,8 @@ Following tests are defined (with perf
> > > commands):
> > > perf record --call-graph fp kill (test-record-
> > > graph-fp-aarch64)
> > > perf record -e '{cycles,instructions}' kill (test-record-
> > > group1)
> > > perf record -e '{cycles/period=1/,instructions/period=2/}:S'
> > > kill (test-record-group2)
> > > + perf record -e '{cycles,cache-misses}:S' kill (test-record-
> > > group-sampling1)
> > > + perf record -c 10000 -e '{cycles,cache-misses}:S' kill (test-
> > > record-group-sampling2)
> > > perf record -D kill (test-record-no-
> > > delay)
> > > perf record -i kill (test-record-no-
> > > inherit)
> > > perf record -n kill (test-record-no-
> > > samples)
> > > diff --git a/tools/perf/tests/attr/test-record-group-sampling
> > > b/tools/perf/tests/attr/test-record-group-sampling
> > > deleted file mode 100644
> > > index 97e7e64a38f07..0000000000000
> > > --- a/tools/perf/tests/attr/test-record-group-sampling
> > > +++ /dev/null
> > > @@ -1,39 +0,0 @@
> > > -[config]
> > > -command = record
> > > -args = --no-bpf-event -e '{cycles,cache-misses}:S' kill
> > > > /dev/null 2>&1
> > > -ret = 1
> > > -
> > > -[event-1:base-record]
> > > -fd=1
> > > -group_fd=-1
> > > -sample_type=343
> > > -read_format=12|28
> > > -inherit=0
> > > -
> > > -[event-2:base-record]
> > > -fd=2
> > > -group_fd=1
> > > -
> > > -# cache-misses
> > > -type=0
> > > -config=3
> > > -
> > > -# default | PERF_SAMPLE_READ
> > > -sample_type=343
> > > -
> > > -# PERF_FORMAT_ID | PERF_FORMAT_GROUP | PERF_FORMAT_LOST
> > > -read_format=12|28
> > > -task=0
> > > -mmap=0
> > > -comm=0
> > > -enable_on_exec=0
> > > -disabled=0
> > > -
> > > -# inherit is disabled for group sampling
> > > -inherit=0
> > > -
> > > -# sampling disabled
> > > -sample_freq=0
> > > -sample_period=0
> > > -freq=0
> > > -write_backward=0
> > > diff --git a/tools/perf/tests/attr/test-record-group-sampling1
> > > b/tools/perf/tests/attr/test-record-group-sampling1
> > > new file mode 100644
> > > index 0000000000000..9b87306266329
> > > --- /dev/null
> > > +++ b/tools/perf/tests/attr/test-record-group-sampling1
> > > @@ -0,0 +1,50 @@
> > > +[config]
> > > +command = record
> > > +args = --no-bpf-event -e '{cycles,cache-misses}:S' kill
> > > > /dev/null 2>&1
> > > +ret = 1
> > > +
> > > +[event-1:base-record]
> > > +fd=1
> > > +group_fd=-1
> > > +
> > > +# cycles
> > > +type=0
> > > +config=0
> > > +
> > > +# default | PERF_SAMPLE_READ
> > > +sample_type=343
> > > +
> > > +# PERF_FORMAT_ID | PERF_FORMAT_GROUP | PERF_FORMAT_LOST |
> > > PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING
> > > +read_format=28|31
> > > +task=1
> > > +mmap=1
> > > +comm=1
> > > +enable_on_exec=1
> > > +disabled=1
> > > +
> > > +# inherit is enabled for group sampling
> > > +inherit=1
> > > +
> > > +[event-2:base-record]
> > > +fd=2
> > > +group_fd=1
> > > +
> > > +# cache-misses
> > > +type=0
> > > +config=3
> > > +
> > > +# default | PERF_SAMPLE_READ
> > > +sample_type=343
> > > +
> > > +# PERF_FORMAT_ID | PERF_FORMAT_GROUP | PERF_FORMAT_LOST |
> > > PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING
> > > +read_format=28|31
> > > +task=0
> > > +mmap=0
> > > +comm=0
> > > +enable_on_exec=0
> > > +disabled=0
> > > +freq=0
> > > +
> > > +# inherit is enabled for group sampling
> > > +inherit=1
> > > +
> > > diff --git a/tools/perf/tests/attr/test-record-group-sampling2
> > > b/tools/perf/tests/attr/test-record-group-sampling2
> > > new file mode 100644
> > > index 0000000000000..8e29fc13f6668
> > > --- /dev/null
> > > +++ b/tools/perf/tests/attr/test-record-group-sampling2
> > > @@ -0,0 +1,60 @@
> > > +[config]
> > > +command = record
> > > +args = --no-bpf-event -c 10000 -e '{cycles,cache-misses}:S'
> > > kill >/dev/null 2>&1
> > > +ret = 1
> > > +
> > > +[event-1:base-record]
> > > +fd=1
> > > +group_fd=-1
> > > +
> > > +# cycles
> > > +type=0
> > > +config=0
> > > +
> > > +# default | PERF_SAMPLE_READ
> > > +sample_type=87
> > > +
> > > +# PERF_FORMAT_ID | PERF_FORMAT_GROUP | PERF_FORMAT_LOST |
> > > PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING
> > > +read_format=28|31
> > > +task=1
> > > +mmap=1
> > > +comm=1
> > > +enable_on_exec=1
> > > +disabled=1
> > > +
> > > +# inherit is enabled for group sampling
> > > +inherit=1
> > > +
> > > +# sampling disabled
> > > +sample_freq=0
> > > +sample_period=10000
> > > +freq=0
> > > +write_backward=0
> > > +
> > > +[event-2:base-record]
> > > +fd=2
> > > +group_fd=1
> > > +
> > > +# cache-misses
> > > +type=0
> > > +config=3
> > > +
> > > +# default | PERF_SAMPLE_READ
> > > +sample_type=87
> > > +
> > > +# PERF_FORMAT_ID | PERF_FORMAT_GROUP | PERF_FORMAT_LOST |
> > > PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING
> > > +read_format=28|31
> > > +task=0
> > > +mmap=0
> > > +comm=0
> > > +enable_on_exec=0
> > > +disabled=0
> > > +
> > > +# inherit is enabled for group sampling
> > > +inherit=1
> > > +
> > > +# sampling disabled
> > > +sample_freq=0
> > > +sample_period=0
> > > +freq=0
> > > +write_backward=0
> > > diff --git a/tools/perf/tests/attr/test-record-group2
> > > b/tools/perf/tests/attr/test-record-group2
> > > index cebdaa8e64e47..785892a54d9e1 100644
> > > --- a/tools/perf/tests/attr/test-record-group2
> > > +++ b/tools/perf/tests/attr/test-record-group2
> > > @@ -9,8 +9,9 @@ group_fd=-1
> > > config=0|1
> > > sample_period=1234000
> > > sample_type=87
> > > -read_format=12|28
> > > -inherit=0
> > > +read_format=28|31
> > > +disabled=1
> > > +inherit=1
> > > freq=0
> > >
> > > [event-2:base-record]
> > > @@ -19,9 +20,9 @@ group_fd=1
> > > config=0|1
> > > sample_period=6789000
> > > sample_type=87
> > > -read_format=12|28
> > > +read_format=28|31
> > > disabled=0
> > > -inherit=0
> > > +inherit=1
> > > mmap=0
> > > comm=0
> > > freq=0
> > > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > > index 3536404e9447b..557d409c53d6c 100644
> > > --- a/tools/perf/util/evsel.c
> > > +++ b/tools/perf/util/evsel.c
> > > @@ -1156,7 +1156,15 @@ void evsel__config(struct evsel *evsel,
> > > struct record_opts *opts,
> > > */
> > > if (leader->core.nr_members > 1) {
> > > attr->read_format |= PERF_FORMAT_GROUP;
> > > - attr->inherit = 0;
> > > + }
> > > +
> > > + /*
> > > + * Inherit + SAMPLE_READ requires SAMPLE_TID in
> > > the
> > > read_format
> > > + */
> > > + if (attr->inherit) {
> > > + evsel__set_sample_bit(evsel, TID);
> > > + evsel->core.attr.read_format |=
> > > + PERF_FORMAT_ID;
> > > }
> > > }
> > >
> > > @@ -1832,6 +1840,8 @@ static int __evsel__prepare_open(struct
> > > evsel
> > > *evsel, struct perf_cpu_map *cpus,
> > >
> > > static void evsel__disable_missing_features(struct evsel *evsel)
> > > {
> > > + if (perf_missing_features.inherit_sample_read)
> > > + evsel->core.attr.inherit = 0;
> > > if (perf_missing_features.branch_counters)
> > > evsel->core.attr.branch_sample_type &=
> > > ~PERF_SAMPLE_BRANCH_COUNTERS;
> > > if (perf_missing_features.read_lost)
> > > @@ -1887,7 +1897,12 @@ bool evsel__detect_missing_features(struct
> > > evsel *evsel)
> > > * Must probe features in the order they were added to
> > > the
> > > * perf_event_attr interface.
> > > */
> > > - if (!perf_missing_features.branch_counters &&
> > > + if (!perf_missing_features.inherit_sample_read &&
> > > + evsel->core.attr.inherit && (evsel-
> > > > core.attr.sample_type & PERF_SAMPLE_READ)) {
> > > + perf_missing_features.inherit_sample_read = true;
> > > + pr_debug2("Using PERF_SAMPLE_READ / :S modifier
> > > is
> > > not compatible with inherit, falling back to no-inherit.\n");
> > > + return true;
> > > + } else if (!perf_missing_features.branch_counters &&
> > > (evsel->core.attr.branch_sample_type &
> > > PERF_SAMPLE_BRANCH_COUNTERS)) {
> > > perf_missing_features.branch_counters = true;
> > > pr_debug2("switching off branch counters
> > > support\n");
> > > diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> > > index 517cff431de20..21b8b7e70e75e 100644
> > > --- a/tools/perf/util/evsel.h
> > > +++ b/tools/perf/util/evsel.h
> > > @@ -192,6 +192,7 @@ struct perf_missing_features {
> > > bool weight_struct;
> > > bool read_lost;
> > > bool branch_counters;
> > > + bool inherit_sample_read;
> > > };
> > >
> > > extern struct perf_missing_features perf_missing_features;
> > > --
> > > 2.45.1
> > >
>
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hello,
On Wed, May 29, 2024 at 10:48 AM Ben Gainey <[email protected]> wrote:
>
> On Mon, 2024-05-27 at 18:47 +0100, Ben Gainey wrote:
> > On Thu, 2024-05-23 at 18:41 -0700, Namhyung Kim wrote:
> > > On Tue, May 21, 2024 at 6:30 AM Ben Gainey <[email protected]>
> > > wrote:
> > > >
> > > > The tool will now default to this new mode if the user specifies
> > > > a
> > > > sampling group when not in system-wide mode, and when --no-
> > > > inherit
> > > > is not specified.
> > > >
> > > > This change updates evsel to allow the combination of inherit
> > > > and PERF_SAMPLE_READ.
> > > >
> > > > A fallback is implemented for kernel versions where this feature
> > > > is
> > > > not
> > > > supported.
> > >
> > > But I'm afraid the test would fail on old kernels. Maybe we need
> > > to
> > > put it in the selftests.
> > >
> >
> > Sorry, not clear what you mean?
> >
> > Is the issue that the fallback on older kernels fails, or that the
> > "perf test" tests fail?
> >
> > Thanks
> > Ben
>
> Just to follow up, I've rechecked the fallback on an unmodified 6.9.1
> kernel with the following:
>
> perf record -vv -e '{cycles,instructions}:S' ls
>
> With an unpatched version of perf running on an unpatched kernel, the
> cycles event is opened as:
>
> ------------------------------------------------------------
> perf_event_attr:
> type 0 (PERF_TYPE_HARDWARE)
> size 136
> config 0 (PERF_COUNT_HW_CPU_CYCLES)
> { sample_period, sample_freq } 4000
> sample_type IP|TID|TIME|READ|ID|PERIOD
> read_format ID|GROUP|LOST
> disabled 1
> exclude_kernel 1
> exclude_hv 1
> mmap 1
> comm 1
> freq 1
> enable_on_exec 1
> task 1
> sample_id_all 1
> exclude_guest 1
> mmap2 1
> comm_exec 1
> ksymbol 1
> bpf_event 1
> ------------------------------------------------------------
>
> whereas with these patches applied to perf, on an unpatched kernel, the
> output is as follows
>
> ------------------------------------------------------------
> perf_event_attr:
> type 0 (PERF_TYPE_HARDWARE)
> size 136
> config 0 (PERF_COUNT_HW_CPU_CYCLES)
> { sample_period, sample_freq } 4000
> sample_type IP|TID|TIME|READ|ID|PERIOD
> read_format ID|GROUP|LOST
> disabled 1
> inherit 1
> exclude_kernel 1
> exclude_hv 1
> mmap 1
> comm 1
> freq 1
> enable_on_exec 1
> task 1
> sample_id_all 1
> exclude_guest 1
> mmap2 1
> comm_exec 1
> ksymbol 1
> bpf_event 1
> ------------------------------------------------------------
> sys_perf_event_open: pid 3442954 cpu 0 group_fd -1 flags 0x8
> sys_perf_event_open failed, error -22
> Using PERF_SAMPLE_READ / :S modifier is not compatible with
> inherit, falling back to no-inherit.
> ------------------------------------------------------------
> perf_event_attr:
> type 0 (PERF_TYPE_HARDWARE)
> size 136
> config 0 (PERF_COUNT_HW_CPU_CYCLES)
> { sample_period, sample_freq } 4000
> sample_type IP|TID|TIME|READ|ID|PERIOD
> read_format ID|GROUP|LOST
> disabled 1
> exclude_kernel 1
> exclude_hv 1
> mmap 1
> comm 1
> freq 1
> enable_on_exec 1
> task 1
> sample_id_all 1
> exclude_guest 1
> mmap2 1
> comm_exec 1
> ksymbol 1
> bpf_event 1
> ------------------------------------------------------------
>
> The command falls back to the same configuration as was previously
> used. The same is true for the instructions event.
>
> `perf test` fails on an unpatched kernel in "15: Setup struct
> perf_event_attr" for the test "test-record-group-sampling1" but that is
> surely expected for unpatched kernels?
>
> Is there some very-old kernel version where this would be expected to
> succeed by accident?
I don't think so but I don't want the test to fail depending on the
kernel version. Maybe we can check the allowed combination
first and skip the test if perf_event_open() fails. And then it can
verify if the kernel rejects the unsupported combinations. Not
sure if it's easy to do that in the current attr test framework.
Thanks,
Namhyung
On 29/05/2024 20:18, Namhyung Kim wrote:
> Hello,
>
> On Wed, May 29, 2024 at 10:48 AM Ben Gainey <[email protected]> wrote:
>>
>> On Mon, 2024-05-27 at 18:47 +0100, Ben Gainey wrote:
>>> On Thu, 2024-05-23 at 18:41 -0700, Namhyung Kim wrote:
>>>> On Tue, May 21, 2024 at 6:30 AM Ben Gainey <[email protected]>
>>>> wrote:
>>>>>
>>>>> The tool will now default to this new mode if the user specifies
>>>>> a
>>>>> sampling group when not in system-wide mode, and when --no-
>>>>> inherit
>>>>> is not specified.
>>>>>
>>>>> This change updates evsel to allow the combination of inherit
>>>>> and PERF_SAMPLE_READ.
>>>>>
>>>>> A fallback is implemented for kernel versions where this feature
>>>>> is
>>>>> not
>>>>> supported.
>>>>
>>>> But I'm afraid the test would fail on old kernels. Maybe we need
>>>> to
>>>> put it in the selftests.
>>>>
>>>
>>> Sorry, not clear what you mean?
>>>
>>> Is the issue that the fallback on older kernels fails, or that the
>>> "perf test" tests fail?
>>>
>>> Thanks
>>> Ben
>>
>> Just to follow up, I've rechecked the fallback on an unmodified 6.9.1
>> kernel with the following:
>>
>> perf record -vv -e '{cycles,instructions}:S' ls
>>
>> With an unpatched version of perf running on an unpatched kernel, the
>> cycles event is opened as:
>>
>> ------------------------------------------------------------
>> perf_event_attr:
>> type 0 (PERF_TYPE_HARDWARE)
>> size 136
>> config 0 (PERF_COUNT_HW_CPU_CYCLES)
>> { sample_period, sample_freq } 4000
>> sample_type IP|TID|TIME|READ|ID|PERIOD
>> read_format ID|GROUP|LOST
>> disabled 1
>> exclude_kernel 1
>> exclude_hv 1
>> mmap 1
>> comm 1
>> freq 1
>> enable_on_exec 1
>> task 1
>> sample_id_all 1
>> exclude_guest 1
>> mmap2 1
>> comm_exec 1
>> ksymbol 1
>> bpf_event 1
>> ------------------------------------------------------------
>>
>> whereas with these patches applied to perf, on an unpatched kernel, the
>> output is as follows
>>
>> ------------------------------------------------------------
>> perf_event_attr:
>> type 0 (PERF_TYPE_HARDWARE)
>> size 136
>> config 0 (PERF_COUNT_HW_CPU_CYCLES)
>> { sample_period, sample_freq } 4000
>> sample_type IP|TID|TIME|READ|ID|PERIOD
>> read_format ID|GROUP|LOST
>> disabled 1
>> inherit 1
>> exclude_kernel 1
>> exclude_hv 1
>> mmap 1
>> comm 1
>> freq 1
>> enable_on_exec 1
>> task 1
>> sample_id_all 1
>> exclude_guest 1
>> mmap2 1
>> comm_exec 1
>> ksymbol 1
>> bpf_event 1
>> ------------------------------------------------------------
>> sys_perf_event_open: pid 3442954 cpu 0 group_fd -1 flags 0x8
>> sys_perf_event_open failed, error -22
>> Using PERF_SAMPLE_READ / :S modifier is not compatible with
>> inherit, falling back to no-inherit.
>> ------------------------------------------------------------
>> perf_event_attr:
>> type 0 (PERF_TYPE_HARDWARE)
>> size 136
>> config 0 (PERF_COUNT_HW_CPU_CYCLES)
>> { sample_period, sample_freq } 4000
>> sample_type IP|TID|TIME|READ|ID|PERIOD
>> read_format ID|GROUP|LOST
>> disabled 1
>> exclude_kernel 1
>> exclude_hv 1
>> mmap 1
>> comm 1
>> freq 1
>> enable_on_exec 1
>> task 1
>> sample_id_all 1
>> exclude_guest 1
>> mmap2 1
>> comm_exec 1
>> ksymbol 1
>> bpf_event 1
>> ------------------------------------------------------------
>>
>> The command falls back to the same configuration as was previously
>> used. The same is true for the instructions event.
>>
>> `perf test` fails on an unpatched kernel in "15: Setup struct
>> perf_event_attr" for the test "test-record-group-sampling1" but that is
>> surely expected for unpatched kernels?
>>
>> Is there some very-old kernel version where this would be expected to
>> succeed by accident?
>
> I don't think so but I don't want the test to fail depending on the
> kernel version. Maybe we can check the allowed combination
> first and skip the test if perf_event_open() fails. And then it can
> verify if the kernel rejects the unsupported combinations. Not
> sure if it's easy to do that in the current attr test framework.
>
> Thanks,
> Namhyung
I added the kernel version feature to the attr tests. Seems like you can
add two tests, one for before and one for after.
Search "kernel_until" in tests/attr