2023-08-26 14:10:25

by Yang Jihong

[permalink] [raw]
Subject: [PATCH v7 0/6] perf record: Track sideband events for all CPUs when tracing selected CPUs

User space tasks can migrate between CPUs, track sideband events for all
CPUs.

The specific scenarios are as follows:

CPU0 CPU1
perf record -C 0 start
taskA starts to be created and executed
-> PERF_RECORD_COMM and PERF_RECORD_MMAP
events only deliver to CPU1
......
|
migrate to CPU0
|
Running on CPU0 <----------/
...

perf record -C 0 stop

Now perf samples the PC of taskA. However, perf does not record the
PERF_RECORD_COMM and PERF_RECORD_COMM events of taskA.
Therefore, the comm and symbols of taskA cannot be parsed.

The sys_perf_event_open invoked is as follows:

# perf --debug verbose=3 record -e cpu-clock -C 1 true
<SNIP>
Opening: cpu-clock
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0 (PERF_COUNT_SW_CPU_CLOCK)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID|LOST
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 5
Opening: dummy:u
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0x9 (PERF_COUNT_SW_DUMMY)
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CPU|IDENTIFIER
read_format ID|LOST
inherit 1
exclude_kernel 1
exclude_hv 1
mmap 1
comm 1
task 1
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
ksymbol 1
bpf_event 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 7
sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 9
sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 10
sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 11
sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 12
sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 13
sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 14
<SNIP>

Changes since_v6:
- Patch1:
1. No change.
2. Keep Acked-by tag from Adrian.
- Patch2:
1. Update commit message as suggested by Ian.
2. Keep Acked-by tag from Adrian because code is not modified.
- Patch3:
1. Update comment as suggested by Ian.
2. Merge original patch5 ("perf test: Update base-record & system-wide-dummy attr") as suggested by Ian.
3. Only merge commit, keep Acked-by tag from Adrian.
- Patch4:
1. No change. Because Adrian recommends not changing the function name.
2. Keep Acked-by tag from Adrian.
- Patch5:
1. Add cleanup on trap function as suggested by Ian.
2. Remove Tested-by tag from Adrian because the script is modified.
- Patch6:
1. Add Reviewed-by tag from Ian.

Changes since_v5:
- No code changes.
- Detailed commit message of patch3.
- Add Acked-by and Tested-by tags from Adrian Hunter.

Changes since_v4:
- Simplify check code for record__tracking_system_wide().
- Add perf attr test result to commit message for patch 7.

Changes since_v3:
- Check fall_kernel, all_user, and dummy or exclude_user when determining
whether system wide is required.

Changes since_v2:
- Rename record_tracking.sh to record_sideband.sh in tools/perf/tests/shell.
- Remove "perf evlist: Skip dummy event sample_type check for evlist_config" patch.
- Add opts->all_kernel check in record__config_tracking_events().
- Add perf_event_attr test for record selected CPUs exclude_user.
- Update base-record & system-wide-dummy sample_type attr expected values for test-record-C0.

Changes since v1:
- Add perf_evlist__go_system_wide() via internal/evlist.h instead of
exporting perf_evlist__propagate_maps().
- Use evlist__add_aux_dummy() instead of evlist__add_dummy() in
evlist__findnew_tracking_event().
- Add a parameter in evlist__findnew_tracking_event() to deal with
system_wide inside.
- Add sideband for all CPUs when tracing selected CPUs comments on
the perf record man page.
- Use "sideband events" instead of "tracking events".
- Adjust the patches Sequence.
- Add patch5 to skip dummy event sample_type check for evlist_config.
- Add patch6 to update system-wide-dummy attr values for perf test.

Yang Jihong (6):
perf evlist: Add perf_evlist__go_system_wide() helper
perf evlist: Add evlist__findnew_tracking_event() helper
perf record: Move setting tracking events before
record__init_thread_masks()
perf record: Track sideband events for all CPUs when tracing selected
CPUs
perf test: Add test case for record sideband events
perf test: Add perf_event_attr test for record selected CPUs
exclude_user

tools/lib/perf/evlist.c | 9 ++
tools/lib/perf/include/internal/evlist.h | 2 +
tools/perf/Documentation/perf-record.txt | 3 +
tools/perf/builtin-record.c | 106 +++++++++++++-----
tools/perf/tests/attr/system-wide-dummy | 14 ++-
tools/perf/tests/attr/test-record-C0 | 4 +-
.../perf/tests/attr/test-record-C0-all-kernel | 32 ++++++
tools/perf/tests/shell/record_sideband.sh | 58 ++++++++++
tools/perf/util/evlist.c | 18 +++
tools/perf/util/evlist.h | 1 +
10 files changed, 212 insertions(+), 35 deletions(-)
create mode 100644 tools/perf/tests/attr/test-record-C0-all-kernel
create mode 100755 tools/perf/tests/shell/record_sideband.sh

--
2.30.GIT



2023-08-29 13:17:03

by Yang Jihong

[permalink] [raw]
Subject: Re: [PATCH v7 0/6] perf record: Track sideband events for all CPUs when tracing selected CPUs

Hello,

On 2023/8/29 13:37, Ravi Bangoria wrote:
> On 26-Aug-23 8:56 AM, Yang Jihong wrote:
>> User space tasks can migrate between CPUs, track sideband events for all
>> CPUs.
>
> I've tested this series with simple test program and it mostly works fine.
>
> Tested-by: Ravi Bangoria <[email protected]>
>
Thanks for the test.

> Since we are recording sideband data on all cpus, perf will endup recording
> lots of unnecessary data, esp. on large systems. E.g. on a 128 cpu system:
>
> Without patch:
> $ sudo ./perf record -C 10 -o perf.data.without -- sleep 10
> $ du -d1 -ha ./perf.data.without
> 3.0M ./perf.data.without
>
> $ sudo ./perf report --stats -i perf.data.without
> Aggregated stats:
> TOTAL events: 47011
> MMAP events: 51 ( 0.1%)
> COMM events: 1549 ( 3.3%)
> EXIT events: 105 ( 0.2%)
> FORK events: 1544 ( 3.3%)
> SAMPLE events: 38226 (81.3%)
> MMAP2 events: 5485 (11.7%)
> ...
> cycles:P stats:
> SAMPLE events: 38226
>
> With patch:
> $ sudo ./perf record -C 10 -o perf.data.with -- sleep 10
> $ du -d1 -ha ./perf.data.with
> 15M ./perf.data.with
>
> $ sudo ./perf report --stats -i perf.data.with
> Aggregated stats:
> TOTAL events: 160164
> MMAP events: 51 ( 0.0%)
> COMM events: 12879 ( 8.0%)
> EXIT events: 11192 ( 7.0%)
> FORK events: 12669 ( 7.9%)
> SAMPLE events: 38464 (24.0%)
> MMAP2 events: 84844 (53.0%)
> ...
> cycles:P stats:
> SAMPLE events: 38464
>
> Number of actual samples are same ~38K. However, the perf.data file is 5x
> bigger because of additional sideband data.

Yes, if record system wide sideband data, the amount of sideband events
will increase proportionally, which is expected.

>
> I'm pretty sure we don't need most of those additional data. So, thinking
> loud, should we post-process perf.data file and filter out unnecessary data?
>

I wonder if we can add a new function in perf inject.
By reading perf.data and comparing tid of SAMPLE events and sideband
events, we can filter out the sideband data of unmatched tasks.

That's just my idea, or there's another better solution.

Thanks,
Yang