2024-04-15 08:34:58

by Ben Gainey

[permalink] [raw]
Subject: [PATCH v5 0/4] perf: Support PERF_SAMPLE_READ with inherit

This change allows events to use PERF_SAMPLE READ with inherit so long
as PERF_SAMPLE_TID is also set.

Currently it is not possible to use PERF_SAMPLE_READ with inherit. This
restriction assumes the user is interested in collecting aggregate
statistics as per `perf stat`. It prevents a user from collecting
per-thread samples using counter groups from a multi-threaded or
multi-process application, as with `perf record -e '{....}:S'`. Instead
users must use system-wide mode, or forgo the ability to sample counter
groups. System-wide mode is often problematic as it requires specific
permissions (no CAP_PERFMON / root access), or may lead to capture of
significant amounts of extra data from other processes running on the
system.

This patch changes `perf_event_alloc` relaxing the restriction against
combining `inherit` with `PERF_SAMPLE_READ` so that the combination
will be allowed so long as `PERF_SAMPLE_TID` is enabled. It modifies
sampling so that only the count associated with the active thread is
recorded into the buffer. It modifies the context switch handling so
that perf contexts are always switched out if they have this kind of
event so that the correct per-thread state is maintained. Finally, the
tools are updated to allow perf record to specify this combination and
to correctly decode the sample data.

In this configuration stream ids (such as may appear in the read_format
field of a PERF_RECORD_SAMPLE) are no longer globally unique, rather
the pair of (stream id, tid) uniquely identify each event. Tools that
rely on this, for example to calculate a delta between samples, would
need updating to take this into account. Previously valid event
configurations (system-wide, no-inherit and so on) where each stream id
is the identifier are unaffected.


Changes since v4:
- Rebase on v6.9-rc1
- Removed the dependency on inherit_stat that was previously assumed
necessary as per feedback from Namhyung Kim.
- Fixed an incorrect use of zfree instead of free in the tools leading
to an abort on tool shutdown.
- Additional test coverage improvements added to perf test.
- Cleaned up the remaining bit of irrelevant change missed between v3
and v4.

Changes since v3:
- Cleaned up perf test data changes incorrectly included into this
series from elsewhere.

Changes since v2:
- Rebase on v6.8
- Respond to James Clarke's feedback; fixup some typos and move some
repeated checks into a helper macro.
- Cleaned up checkpatch lints.
- Updated perf test; fixed evsel handling so that existing tests pass
and added new tests to cover the new behaviour.

Changes since v1:
- Rebase on v6.8-rc1
- Fixed value written into sample after child exists.
- Modified handling of switch-out so that context with these events
take the slow path, so that the per-event/per-thread PMU state is
correctly switched.
- Modified perf tools to support this mode of operation.

Ben Gainey (4):
perf: Support PERF_SAMPLE_READ with inherit
tools/perf: Track where perf_sample_ids need per-thread periods
tools/perf: Correctly calculate sample period for inherited
SAMPLE_READ values
tools/perf: Allow inherit + PERF_SAMPLE_READ when opening events

include/linux/perf_event.h | 1 +
kernel/events/core.c | 82 ++++++++++++++-----
tools/lib/perf/evlist.c | 1 +
tools/lib/perf/evsel.c | 48 +++++++++++
tools/lib/perf/include/internal/evsel.h | 54 +++++++++++-
tools/perf/tests/attr/README | 2 +
.../tests/attr/test-record-group-sampling | 39 ---------
.../tests/attr/test-record-group-sampling1 | 50 +++++++++++
.../tests/attr/test-record-group-sampling2 | 60 ++++++++++++++
tools/perf/tests/attr/test-record-group2 | 9 +-
tools/perf/util/evsel.c | 19 ++++-
tools/perf/util/evsel.h | 1 +
tools/perf/util/session.c | 11 ++-
13 files changed, 306 insertions(+), 71 deletions(-)
delete mode 100644 tools/perf/tests/attr/test-record-group-sampling
create mode 100644 tools/perf/tests/attr/test-record-group-sampling1
create mode 100644 tools/perf/tests/attr/test-record-group-sampling2

--
2.44.0



2024-05-09 09:06:43

by Ben Gainey

[permalink] [raw]
Subject: Re: [PATCH v5 0/4] perf: Support PERF_SAMPLE_READ with inherit

Hello folks

I appreciate you're all busy, but any feedback on this one?

Thanks
Ben




On Mon, 2024-04-15 at 09:14 +0100, Ben Gainey wrote:
> This change allows events to use PERF_SAMPLE READ with inherit so
> long
> as PERF_SAMPLE_TID is also set.
>
> Currently it is not possible to use PERF_SAMPLE_READ with inherit.
> This
> restriction assumes the user is interested in collecting aggregate
> statistics as per `perf stat`. It prevents a user from collecting
> per-thread samples using counter groups from a multi-threaded or
> multi-process application, as with `perf record -e '{....}:S'`.
> Instead
> users must use system-wide mode, or forgo the ability to sample
> counter
> groups. System-wide mode is often problematic as it requires specific
> permissions (no CAP_PERFMON / root access), or may lead to capture of
> significant amounts of extra data from other processes running on the
> system.
>
> This patch changes `perf_event_alloc` relaxing the restriction
> against
> combining `inherit` with `PERF_SAMPLE_READ` so that the combination
> will be allowed so long as `PERF_SAMPLE_TID` is enabled. It modifies
> sampling so that only the count associated with the active thread is
> recorded into the buffer. It modifies the context switch handling so
> that perf contexts are always switched out if they have this kind of
> event so that the correct per-thread state is maintained. Finally,
> the
> tools are updated to allow perf record to specify this combination
> and
> to correctly decode the sample data.
>
> In this configuration stream ids (such as may appear in the
> read_format
> field of a PERF_RECORD_SAMPLE) are no longer globally unique, rather
> the pair of (stream id, tid) uniquely identify each event. Tools that
> rely on this, for example to calculate a delta between samples, would
> need updating to take this into account. Previously valid event
> configurations (system-wide, no-inherit and so on) where each stream
> id
> is the identifier are unaffected.
>
>
> Changes since v4:
> - Rebase on v6.9-rc1
> - Removed the dependency on inherit_stat that was previously assumed
> necessary as per feedback from Namhyung Kim.
> - Fixed an incorrect use of zfree instead of free in the tools
> leading
> to an abort on tool shutdown.
> - Additional test coverage improvements added to perf test.
> - Cleaned up the remaining bit of irrelevant change missed between
> v3
> and v4.
>
> Changes since v3:
> - Cleaned up perf test data changes incorrectly included into this
> series from elsewhere.
>
> Changes since v2:
> - Rebase on v6.8
> - Respond to James Clarke's feedback; fixup some typos and move some
> repeated checks into a helper macro.
> - Cleaned up checkpatch lints.
> - Updated perf test; fixed evsel handling so that existing tests
> pass
> and added new tests to cover the new behaviour.
>
> Changes since v1:
> - Rebase on v6.8-rc1
> - Fixed value written into sample after child exists.
> - Modified handling of switch-out so that context with these events
> take the slow path, so that the per-event/per-thread PMU state is
> correctly switched.
> - Modified perf tools to support this mode of operation.
>
> Ben Gainey (4):
> perf: Support PERF_SAMPLE_READ with inherit
> tools/perf: Track where perf_sample_ids need per-thread periods
> tools/perf: Correctly calculate sample period for inherited
> SAMPLE_READ values
> tools/perf: Allow inherit + PERF_SAMPLE_READ when opening events
>
> include/linux/perf_event.h | 1 +
> kernel/events/core.c | 82 ++++++++++++++---
> --
> tools/lib/perf/evlist.c | 1 +
> tools/lib/perf/evsel.c | 48 +++++++++++
> tools/lib/perf/include/internal/evsel.h | 54 +++++++++++-
> tools/perf/tests/attr/README | 2 +
> .../tests/attr/test-record-group-sampling | 39 ---------
> .../tests/attr/test-record-group-sampling1 | 50 +++++++++++
> .../tests/attr/test-record-group-sampling2 | 60 ++++++++++++++
> tools/perf/tests/attr/test-record-group2 | 9 +-
> tools/perf/util/evsel.c | 19 ++++-
> tools/perf/util/evsel.h | 1 +
> tools/perf/util/session.c | 11 ++-
> 13 files changed, 306 insertions(+), 71 deletions(-)
> delete mode 100644 tools/perf/tests/attr/test-record-group-sampling
> create mode 100644 tools/perf/tests/attr/test-record-group-sampling1
> create mode 100644 tools/perf/tests/attr/test-record-group-sampling2
>

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.