User space tasks can migrate between CPUs, track sideband events for all
CPUs.
The specific scenarios are as follows:
CPU0 CPU1
perf record -C 0 start
taskA starts to be created and executed
-> PERF_RECORD_COMM and PERF_RECORD_MMAP
events only deliver to CPU1
......
|
migrate to CPU0
|
Running on CPU0 <----------/
...
perf record -C 0 stop
Now perf samples the PC of taskA. However, perf does not record the
PERF_RECORD_COMM and PERF_RECORD_COMM events of taskA.
Therefore, the comm and symbols of taskA cannot be parsed.
The sys_perf_event_open invoked is as follows:
# perf --debug verbose=3 record -e cpu-clock -C 1 true
<SNIP>
Opening: cpu-clock
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0 (PERF_COUNT_SW_CPU_CLOCK)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID|LOST
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 5
Opening: dummy:u
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0x9 (PERF_COUNT_SW_DUMMY)
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CPU|IDENTIFIER
read_format ID|LOST
inherit 1
exclude_kernel 1
exclude_hv 1
mmap 1
comm 1
task 1
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
ksymbol 1
bpf_event 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 7
sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 9
sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 10
sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 11
sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 12
sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 13
sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 14
<SNIP>
Changes since_v5:
- No code changes.
- Detailed commit message of patch3.
- Add Acked-by and Tested-by tags from Adrian Hunter.
Changes since_v4:
- Simplify check code for record__tracking_system_wide().
- Add perf attr test result to commit message for patch 7.
Changes since_v3:
- Check fall_kernel, all_user, and dummy or exclude_user when determining
whether system wide is required.
Changes since_v2:
- Rename record_tracking.sh to record_sideband.sh in tools/perf/tests/shell.
- Remove "perf evlist: Skip dummy event sample_type check for evlist_config" patch.
- Add opts->all_kernel check in record__config_tracking_events().
- Add perf_event_attr test for record selected CPUs exclude_user.
- Update base-record & system-wide-dummy sample_type attr expected values for test-record-C0.
Changes since v1:
- Add perf_evlist__go_system_wide() via internal/evlist.h instead of
exporting perf_evlist__propagate_maps().
- Use evlist__add_aux_dummy() instead of evlist__add_dummy() in
evlist__findnew_tracking_event().
- Add a parameter in evlist__findnew_tracking_event() to deal with
system_wide inside.
- Add sideband for all CPUs when tracing selected CPUs comments on
the perf record man page.
- Use "sideband events" instead of "tracking events".
- Adjust the patches Sequence.
- Add patch5 to skip dummy event sample_type check for evlist_config.
- Add patch6 to update system-wide-dummy attr values for perf test.
Yang Jihong (7):
perf evlist: Add perf_evlist__go_system_wide() helper
perf evlist: Add evlist__findnew_tracking_event() helper
perf record: Move setting dummy tracking before
record__init_thread_masks()
perf record: Track sideband events for all CPUs when tracing selected
CPUs
perf test: Update base-record & system-wide-dummy attr expected values
for test-record-C0
perf test: Add test case for record sideband events
perf test: Add perf_event_attr test for record selected CPUs
exclude_user
tools/lib/perf/evlist.c | 9 ++
tools/lib/perf/include/internal/evlist.h | 2 +
tools/perf/Documentation/perf-record.txt | 3 +
tools/perf/builtin-record.c | 106 +++++++++++++-----
tools/perf/tests/attr/system-wide-dummy | 14 ++-
tools/perf/tests/attr/test-record-C0 | 4 +-
.../perf/tests/attr/test-record-C0-all-kernel | 32 ++++++
tools/perf/tests/shell/record_sideband.sh | 44 ++++++++
tools/perf/util/evlist.c | 18 +++
tools/perf/util/evlist.h | 1 +
10 files changed, 198 insertions(+), 35 deletions(-)
create mode 100644 tools/perf/tests/attr/test-record-C0-all-kernel
create mode 100755 tools/perf/tests/shell/record_sideband.sh
--
2.30.GIT
User space tasks can migrate between CPUs, we need to track side-band
events for all CPUs.
The specific scenarios are as follows:
CPU0 CPU1
perf record -C 0 start
taskA starts to be created and executed
-> PERF_RECORD_COMM and PERF_RECORD_MMAP
events only deliver to CPU1
......
|
migrate to CPU0
|
Running on CPU0 <----------/
...
perf record -C 0 stop
Now perf samples the PC of taskA. However, perf does not record the
PERF_RECORD_COMM and PERF_RECORD_MMAP events of taskA.
Therefore, the comm and symbols of taskA cannot be parsed.
The solution is to record sideband events for all CPUs when tracing
selected CPUs. Because this modifies the default behavior, add related
comments to the perf record man page.
The sys_perf_event_open invoked is as follows:
# perf --debug verbose=3 record -e cpu-clock -C 1 true
<SNIP>
Opening: cpu-clock
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0 (PERF_COUNT_SW_CPU_CLOCK)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID|LOST
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 5
Opening: dummy:u
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0x9 (PERF_COUNT_SW_DUMMY)
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CPU|IDENTIFIER
read_format ID|LOST
inherit 1
exclude_kernel 1
exclude_hv 1
mmap 1
comm 1
task 1
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
ksymbol 1
bpf_event 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 7
sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 9
sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 10
sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 11
sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 12
sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 13
sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 14
<SNIP>
Signed-off-by: Yang Jihong <[email protected]>
Acked-by: Adrian Hunter <[email protected]>
---
tools/perf/Documentation/perf-record.txt | 3 ++
tools/perf/builtin-record.c | 44 +++++++++++++++++++++++-
2 files changed, 46 insertions(+), 1 deletion(-)
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index d5217be012d7..1889f66addf2 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -374,6 +374,9 @@ comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-
In per-thread mode with inheritance mode on (default), samples are captured only when
the thread executes on the designated CPUs. Default is to monitor all CPUs.
+User space tasks can migrate between CPUs, so when tracing selected CPUs,
+a dummy event is created to track sideband for all CPUs.
+
-B::
--no-buildid::
Do not save the build ids of binaries in the perf.data files. This skips
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4ee94058028f..ae2e21b945fa 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -906,10 +906,44 @@ static int record__config_off_cpu(struct record *rec)
return off_cpu_prepare(rec->evlist, &rec->opts.target, &rec->opts);
}
+static bool record__tracking_system_wide(struct record *rec)
+{
+ struct record_opts *opts = &rec->opts;
+ struct evlist *evlist = rec->evlist;
+ struct evsel *evsel;
+
+ /*
+ * If all (non-dummy) evsel have exclude_user,
+ * system_wide is not needed.
+ *
+ * all_kernel and all_user will overwrite exclude_kernel and
+ * exclude_user of attr in evsel__config(), here need to check
+ * all the three items.
+ *
+ * Sideband system wide if one of the following conditions is met:
+ *
+ * - all_user is set, and there is a non-dummy event
+ * - all_user and all_kernel are not set, and there is
+ * a non-dummy event without exclude_user
+ */
+ if (opts->all_kernel)
+ return false;
+
+ evlist__for_each_entry(evlist, evsel) {
+ if (!evsel__is_dummy_event(evsel)) {
+ if (opts->all_user || !evsel->core.attr.exclude_user)
+ return true;
+ }
+ }
+
+ return false;
+}
+
static int record__config_tracking_events(struct record *rec)
{
struct record_opts *opts = &rec->opts;
struct evlist *evlist = rec->evlist;
+ bool system_wide = false;
struct evsel *evsel;
/*
@@ -919,7 +953,15 @@ static int record__config_tracking_events(struct record *rec)
*/
if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
perf_pmus__num_core_pmus() > 1) {
- evsel = evlist__findnew_tracking_event(evlist, false);
+
+ /*
+ * User space tasks can migrate between CPUs, so when tracing
+ * selected CPUs, sideband for all CPUs is still needed.
+ */
+ if (!!opts->target.cpu_list && record__tracking_system_wide(rec))
+ system_wide = true;
+
+ evsel = evlist__findnew_tracking_event(evlist, system_wide);
if (!evsel)
return -ENOMEM;
--
2.30.GIT
For dummy events that keep tracking, we may need to modify its cpu_maps.
For example, change the cpu_maps to record sideband events for all CPUS.
Add perf_evlist__go_system_wide() helper to support this scenario.
Signed-off-by: Yang Jihong <[email protected]>
Acked-by: Adrian Hunter <[email protected]>
---
tools/lib/perf/evlist.c | 9 +++++++++
tools/lib/perf/include/internal/evlist.h | 2 ++
2 files changed, 11 insertions(+)
diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index b8b066d0dc5e..3acbbccc1901 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -738,3 +738,12 @@ int perf_evlist__nr_groups(struct perf_evlist *evlist)
}
return nr_groups;
}
+
+void perf_evlist__go_system_wide(struct perf_evlist *evlist, struct perf_evsel *evsel)
+{
+ if (!evsel->system_wide) {
+ evsel->system_wide = true;
+ if (evlist->needs_map_propagation)
+ __perf_evlist__propagate_maps(evlist, evsel);
+ }
+}
diff --git a/tools/lib/perf/include/internal/evlist.h b/tools/lib/perf/include/internal/evlist.h
index 3339bc2f1765..d86ffe8ed483 100644
--- a/tools/lib/perf/include/internal/evlist.h
+++ b/tools/lib/perf/include/internal/evlist.h
@@ -135,4 +135,6 @@ int perf_evlist__id_add_fd(struct perf_evlist *evlist,
void perf_evlist__reset_id_hash(struct perf_evlist *evlist);
void __perf_evlist__set_leader(struct list_head *list, struct perf_evsel *leader);
+
+void perf_evlist__go_system_wide(struct perf_evlist *evlist, struct perf_evsel *evsel);
#endif /* __LIBPERF_INTERNAL_EVLIST_H */
--
2.30.GIT
Hi Arnaldo,
Can you consider applying this patchset ?
Please let me know if there is anything that needs to be fixed.
Yang,
Thanks
On 2023/8/21 9:27, Yang Jihong wrote:
> User space tasks can migrate between CPUs, track sideband events for all
> CPUs.
>
> The specific scenarios are as follows:
>
> CPU0 CPU1
> perf record -C 0 start
> taskA starts to be created and executed
> -> PERF_RECORD_COMM and PERF_RECORD_MMAP
> events only deliver to CPU1
> ......
> |
> migrate to CPU0
> |
> Running on CPU0 <----------/
> ...
>
> perf record -C 0 stop
>
> Now perf samples the PC of taskA. However, perf does not record the
> PERF_RECORD_COMM and PERF_RECORD_COMM events of taskA.
> Therefore, the comm and symbols of taskA cannot be parsed.
>
> The sys_perf_event_open invoked is as follows:
>
> # perf --debug verbose=3 record -e cpu-clock -C 1 true
> <SNIP>
> Opening: cpu-clock
> ------------------------------------------------------------
> perf_event_attr:
> type 1 (PERF_TYPE_SOFTWARE)
> size 136
> config 0 (PERF_COUNT_SW_CPU_CLOCK)
> { sample_period, sample_freq } 4000
> sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
> read_format ID|LOST
> disabled 1
> inherit 1
> freq 1
> sample_id_all 1
> exclude_guest 1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 5
> Opening: dummy:u
> ------------------------------------------------------------
> perf_event_attr:
> type 1 (PERF_TYPE_SOFTWARE)
> size 136
> config 0x9 (PERF_COUNT_SW_DUMMY)
> { sample_period, sample_freq } 1
> sample_type IP|TID|TIME|CPU|IDENTIFIER
> read_format ID|LOST
> inherit 1
> exclude_kernel 1
> exclude_hv 1
> mmap 1
> comm 1
> task 1
> sample_id_all 1
> exclude_guest 1
> mmap2 1
> comm_exec 1
> ksymbol 1
> bpf_event 1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6
> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 7
> sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 9
> sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 10
> sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 11
> sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 12
> sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 13
> sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 14
> <SNIP>
>
> Changes since_v5:
> - No code changes.
> - Detailed commit message of patch3.
> - Add Acked-by and Tested-by tags from Adrian Hunter.
>
> Changes since_v4:
> - Simplify check code for record__tracking_system_wide().
> - Add perf attr test result to commit message for patch 7.
>
> Changes since_v3:
> - Check fall_kernel, all_user, and dummy or exclude_user when determining
> whether system wide is required.
>
> Changes since_v2:
> - Rename record_tracking.sh to record_sideband.sh in tools/perf/tests/shell.
> - Remove "perf evlist: Skip dummy event sample_type check for evlist_config" patch.
> - Add opts->all_kernel check in record__config_tracking_events().
> - Add perf_event_attr test for record selected CPUs exclude_user.
> - Update base-record & system-wide-dummy sample_type attr expected values for test-record-C0.
>
> Changes since v1:
> - Add perf_evlist__go_system_wide() via internal/evlist.h instead of
> exporting perf_evlist__propagate_maps().
> - Use evlist__add_aux_dummy() instead of evlist__add_dummy() in
> evlist__findnew_tracking_event().
> - Add a parameter in evlist__findnew_tracking_event() to deal with
> system_wide inside.
> - Add sideband for all CPUs when tracing selected CPUs comments on
> the perf record man page.
> - Use "sideband events" instead of "tracking events".
> - Adjust the patches Sequence.
> - Add patch5 to skip dummy event sample_type check for evlist_config.
> - Add patch6 to update system-wide-dummy attr values for perf test.
>
> Yang Jihong (7):
> perf evlist: Add perf_evlist__go_system_wide() helper
> perf evlist: Add evlist__findnew_tracking_event() helper
> perf record: Move setting dummy tracking before
> record__init_thread_masks()
> perf record: Track sideband events for all CPUs when tracing selected
> CPUs
> perf test: Update base-record & system-wide-dummy attr expected values
> for test-record-C0
> perf test: Add test case for record sideband events
> perf test: Add perf_event_attr test for record selected CPUs
> exclude_user
>
> tools/lib/perf/evlist.c | 9 ++
> tools/lib/perf/include/internal/evlist.h | 2 +
> tools/perf/Documentation/perf-record.txt | 3 +
> tools/perf/builtin-record.c | 106 +++++++++++++-----
> tools/perf/tests/attr/system-wide-dummy | 14 ++-
> tools/perf/tests/attr/test-record-C0 | 4 +-
> .../perf/tests/attr/test-record-C0-all-kernel | 32 ++++++
> tools/perf/tests/shell/record_sideband.sh | 44 ++++++++
> tools/perf/util/evlist.c | 18 +++
> tools/perf/util/evlist.h | 1 +
> 10 files changed, 198 insertions(+), 35 deletions(-)
> create mode 100644 tools/perf/tests/attr/test-record-C0-all-kernel
> create mode 100755 tools/perf/tests/shell/record_sideband.sh
>
Hello,
On 2023/8/25 12:51, Ian Rogers wrote:
> On Sun, Aug 20, 2023 at 6:30 PM Yang Jihong <[email protected]> wrote:
>>
>> For dummy events that keep tracking, we may need to modify its cpu_maps.
>> For example, change the cpu_maps to record sideband events for all CPUS.
>> Add perf_evlist__go_system_wide() helper to support this scenario.
>>
>> Signed-off-by: Yang Jihong <[email protected]>
>> Acked-by: Adrian Hunter <[email protected]>
>> ---
>> tools/lib/perf/evlist.c | 9 +++++++++
>> tools/lib/perf/include/internal/evlist.h | 2 ++
>> 2 files changed, 11 insertions(+)
>>
>> diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
>> index b8b066d0dc5e..3acbbccc1901 100644
>> --- a/tools/lib/perf/evlist.c
>> +++ b/tools/lib/perf/evlist.c
>> @@ -738,3 +738,12 @@ int perf_evlist__nr_groups(struct perf_evlist *evlist)
>> }
>> return nr_groups;
>> }
>> +
>> +void perf_evlist__go_system_wide(struct perf_evlist *evlist, struct perf_evsel *evsel)
>> +{
>> + if (!evsel->system_wide) {
>> + evsel->system_wide = true;
>> + if (evlist->needs_map_propagation)
>> + __perf_evlist__propagate_maps(evlist, evsel);
>> + }
>> +}
>
> I think this should be:
>
> void evsel__set_system_wide(struct evsel *evsel)
> {
> if (evsel->system_wide)
> return;
> evsel->system_wide = true;
> if (evsel->evlist->core.needs_map_propagation)
> ...
>
> The API being on evlist makes it look like all evsels are affected.
>
This part of the code is implemented according to Adrian's suggestion.
Refer to:
https://lore.kernel.org/linux-perf-users/[email protected]/
The logic of both is the same, and either is OK for me.
If really want to change it, please let me know.
Thanks,
Yang
Hello,
On 2023/8/25 13:45, Ian Rogers wrote:
>
>
> On Thu, Aug 24, 2023, 10:41 PM Yang Jihong <[email protected]
> <mailto:[email protected]>> wrote:
>
> Hello,
>
> On 2023/8/25 12:51, Ian Rogers wrote:
> > On Sun, Aug 20, 2023 at 6:30 PM Yang Jihong
> <[email protected] <mailto:[email protected]>> wrote:
> >>
> >> For dummy events that keep tracking, we may need to modify its
> cpu_maps.
> >> For example, change the cpu_maps to record sideband events for
> all CPUS.
> >> Add perf_evlist__go_system_wide() helper to support this scenario.
> >>
> >> Signed-off-by: Yang Jihong <[email protected]
> <mailto:[email protected]>>
> >> Acked-by: Adrian Hunter <[email protected]
> <mailto:[email protected]>>
> >> ---
> >> tools/lib/perf/evlist.c | 9 +++++++++
> >> tools/lib/perf/include/internal/evlist.h | 2 ++
> >> 2 files changed, 11 insertions(+)
> >>
> >> diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> >> index b8b066d0dc5e..3acbbccc1901 100644
> >> --- a/tools/lib/perf/evlist.c
> >> +++ b/tools/lib/perf/evlist.c
> >> @@ -738,3 +738,12 @@ int perf_evlist__nr_groups(struct
> perf_evlist *evlist)
> >> }
> >> return nr_groups;
> >> }
> >> +
> >> +void perf_evlist__go_system_wide(struct perf_evlist *evlist,
> struct perf_evsel *evsel)
> >> +{
> >> + if (!evsel->system_wide) {
> >> + evsel->system_wide = true;
> >> + if (evlist->needs_map_propagation)
> >> + __perf_evlist__propagate_maps(evlist,
> evsel);
> >> + }
> >> +}
> >
> > I think this should be:
> >
> > void evsel__set_system_wide(struct evsel *evsel)
> > {
> > if (evsel->system_wide)
> > return;
> > evsel->system_wide = true;
> > if (evsel->evlist->core.needs_map_propagation)
> > ...
> >
> > The API being on evlist makes it look like all evsels are affected.
> >
> This part of the code is implemented according to Adrian's suggestion.
> Refer to:
>
> https://lore.kernel.org/linux-perf-users/[email protected]/
> <https://lore.kernel.org/linux-perf-users/[email protected]/>
>
> The logic of both is the same, and either is OK for me.
> If really want to change it, please let me know.
>
>
> Yes, I think the naming isn't correct and the function being on evlist
> is misleading.
Uh, I have a little problem here, too.
Because perf_evlist__go_system_wide() needs to invoke the
__perf_evlist__propagate_maps(), which is a local function and is
located in the evlist.c file in tools/lib/perf/.
So perf_evlist__go_system_wide() can only be located in this file. The
prefixes of all funcstions in this file are "perf_evlist__". Therefore,
it is better to use the original names.
In addition, __perf_evlist__propagate_maps affects the evlist, so it is
not misleading.
Thanks,
Yang
Hello,
On 2023/8/25 13:45, Ian Rogers wrote:
>
>
> On Thu, Aug 24, 2023, 10:41 PM Yang Jihong <[email protected]
> <mailto:[email protected]>> wrote:
>
> Hello,
>
> On 2023/8/25 12:51, Ian Rogers wrote:
> > On Sun, Aug 20, 2023 at 6:30 PM Yang Jihong
> <[email protected] <mailto:[email protected]>> wrote:
> >>
> >> For dummy events that keep tracking, we may need to modify its
> cpu_maps.
> >> For example, change the cpu_maps to record sideband events for
> all CPUS.
> >> Add perf_evlist__go_system_wide() helper to support this scenario.
> >>
> >> Signed-off-by: Yang Jihong <[email protected]
> <mailto:[email protected]>>
> >> Acked-by: Adrian Hunter <[email protected]
> <mailto:[email protected]>>
> >> ---
> >> tools/lib/perf/evlist.c | 9 +++++++++
> >> tools/lib/perf/include/internal/evlist.h | 2 ++
> >> 2 files changed, 11 insertions(+)
> >>
> >> diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> >> index b8b066d0dc5e..3acbbccc1901 100644
> >> --- a/tools/lib/perf/evlist.c
> >> +++ b/tools/lib/perf/evlist.c
> >> @@ -738,3 +738,12 @@ int perf_evlist__nr_groups(struct
> perf_evlist *evlist)
> >> }
> >> return nr_groups;
> >> }
> >> +
> >> +void perf_evlist__go_system_wide(struct perf_evlist *evlist,
> struct perf_evsel *evsel)
> >> +{
> >> + if (!evsel->system_wide) {
> >> + evsel->system_wide = true;
> >> + if (evlist->needs_map_propagation)
> >> + __perf_evlist__propagate_maps(evlist,
> evsel);
> >> + }
> >> +}
> >
> > I think this should be:
> >
> > void evsel__set_system_wide(struct evsel *evsel)
> > {
> > if (evsel->system_wide)
> > return;
> > evsel->system_wide = true;
> > if (evsel->evlist->core.needs_map_propagation)
> > ...
> >
> > The API being on evlist makes it look like all evsels are affected.
> >
> This part of the code is implemented according to Adrian's suggestion.
> Refer to:
>
> https://lore.kernel.org/linux-perf-users/[email protected]/
> <https://lore.kernel.org/linux-perf-users/[email protected]/>
>
> The logic of both is the same, and either is OK for me.
> If really want to change it, please let me know.
>
>
> Yes, I think the naming isn't correct and the function being on evlist
> is misleading.
>
Okay, I'll follow the above changes in the next version.
Thanks,
Yang
Hello,
On 2023/8/25 13:17, Ian Rogers wrote:
> On Sun, Aug 20, 2023 at 6:30 PM Yang Jihong <[email protected]> wrote:
>>
>> User space tasks can migrate between CPUs, we need to track side-band
>> events for all CPUs.
>>
>> The specific scenarios are as follows:
>>
>> CPU0 CPU1
>> perf record -C 0 start
>> taskA starts to be created and executed
>> -> PERF_RECORD_COMM and PERF_RECORD_MMAP
>> events only deliver to CPU1
>> ......
>> |
>> migrate to CPU0
>> |
>> Running on CPU0 <----------/
>> ...
>>
>> perf record -C 0 stop
>>
>> Now perf samples the PC of taskA. However, perf does not record the
>> PERF_RECORD_COMM and PERF_RECORD_MMAP events of taskA.
>> Therefore, the comm and symbols of taskA cannot be parsed.
>>
>> The solution is to record sideband events for all CPUs when tracing
>> selected CPUs. Because this modifies the default behavior, add related
>> comments to the perf record man page.
>>
>> The sys_perf_event_open invoked is as follows:
>>
>> # perf --debug verbose=3 record -e cpu-clock -C 1 true
>> <SNIP>
>> Opening: cpu-clock
>> ------------------------------------------------------------
>> perf_event_attr:
>> type 1 (PERF_TYPE_SOFTWARE)
>> size 136
>> config 0 (PERF_COUNT_SW_CPU_CLOCK)
>> { sample_period, sample_freq } 4000
>> sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
>> read_format ID|LOST
>> disabled 1
>> inherit 1
>> freq 1
>> sample_id_all 1
>> exclude_guest 1
>> ------------------------------------------------------------
>> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 5
>> Opening: dummy:u
>> ------------------------------------------------------------
>> perf_event_attr:
>> type 1 (PERF_TYPE_SOFTWARE)
>> size 136
>> config 0x9 (PERF_COUNT_SW_DUMMY)
>> { sample_period, sample_freq } 1
>> sample_type IP|TID|TIME|CPU|IDENTIFIER
>> read_format ID|LOST
>> inherit 1
>> exclude_kernel 1
>> exclude_hv 1
>> mmap 1
>> comm 1
>> task 1
>> sample_id_all 1
>> exclude_guest 1
>> mmap2 1
>> comm_exec 1
>> ksymbol 1
>> bpf_event 1
>> ------------------------------------------------------------
>> sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6
>> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 7
>> sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 9
>> sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 10
>> sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 11
>> sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 12
>> sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 13
>> sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 14
>> <SNIP>
>>
>> Signed-off-by: Yang Jihong <[email protected]>
>> Acked-by: Adrian Hunter <[email protected]>
>> ---
>> tools/perf/Documentation/perf-record.txt | 3 ++
>> tools/perf/builtin-record.c | 44 +++++++++++++++++++++++-
>> 2 files changed, 46 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
>> index d5217be012d7..1889f66addf2 100644
>> --- a/tools/perf/Documentation/perf-record.txt
>> +++ b/tools/perf/Documentation/perf-record.txt
>> @@ -374,6 +374,9 @@ comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-
>> In per-thread mode with inheritance mode on (default), samples are captured only when
>> the thread executes on the designated CPUs. Default is to monitor all CPUs.
>>
>> +User space tasks can migrate between CPUs, so when tracing selected CPUs,
>> +a dummy event is created to track sideband for all CPUs.
>> +
>> -B::
>> --no-buildid::
>> Do not save the build ids of binaries in the perf.data files. This skips
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index 4ee94058028f..ae2e21b945fa 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -906,10 +906,44 @@ static int record__config_off_cpu(struct record *rec)
>> return off_cpu_prepare(rec->evlist, &rec->opts.target, &rec->opts);
>> }
>>
>> +static bool record__tracking_system_wide(struct record *rec)
>
> I think this would be better named something like:
> record__need_system_wide_dummy_event
>
Okay, it'll be modified in the next version.
Thanks,
Yang