2023-07-04 07:47:30

by Yang Jihong

[permalink] [raw]
Subject: [PATCH 0/5] perf record: Tracking side-band events for all CPUs when tracing selected CPUs

User space tasks can migrate between CPUs, we need to track side-band
events for all CPUs.

The specific scenarios are as follows:

CPU0 CPU1
perf record -C 0 start
taskA starts to be created and executed
-> PERF_RECORD_COMM and PERF_RECORD_MMAP
events only deliver to CPU1
......
|
migrate to CPU0
|
Running on CPU0 <----------/
...

perf record -C 0 stop

Now perf samples the PC of taskA. However, perf does not record the
PERF_RECORD_COMM and PERF_RECORD_COMM events of taskA.
Therefore, the comm and symbols of taskA cannot be parsed.

The sys_perf_event_open invoked is as follows:

# perf --debug verbose=3 record -e cpu-clock -C 1 true
<SNIP>
Opening: cpu-clock
------------------------------------------------------------
perf_event_attr:
type 1
size 136
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|ID|CPU|PERIOD
read_format ID|LOST
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 5
Opening: dummy:HG
------------------------------------------------------------
perf_event_attr:
type 1
size 136
config 0x9
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|ID|CPU|PERIOD
read_format ID|LOST
inherit 1
mmap 1
comm 1
freq 1
task 1
sample_id_all 1
mmap2 1
comm_exec 1
ksymbol 1
bpf_event 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 7
sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 9
sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 10
sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 11
sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 12
sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 13
sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 14
<SNIP>

1. patch1 and patch2 are pre-patch and are used to provide two helpers.
One is used to modify cpu_maps,
and the other is used to search for dummy tracking evsel in the evlist.
2. patch3 is the main patch, track side-band events for all CPUs when
tracing selected CPUs
3. patch4 adds the corresponding test case.
4. patch5 is code optimization.

Yang Jihong (5):
perf evlist: Export perf_evlist__propagate_maps()
perf evlist: Add evlist__findnew_tracking_event() helper
perf record: Tracking side-band events for all CPUs when tracing
selected CPUs
perf test: Add test case for record tracking
perf record: All config tracking are integrated into
record__config_tracking_events()

tools/lib/perf/evlist.c | 23 +++----
tools/lib/perf/include/perf/evlist.h | 2 +
tools/perf/builtin-record.c | 74 ++++++++++++++---------
tools/perf/tests/shell/record_tracking.sh | 44 ++++++++++++++
tools/perf/util/evlist.c | 17 ++++++
tools/perf/util/evlist.h | 1 +
6 files changed, 120 insertions(+), 41 deletions(-)
create mode 100755 tools/perf/tests/shell/record_tracking.sh

--
2.30.GIT



2023-07-04 07:52:57

by Yang Jihong

[permalink] [raw]
Subject: [PATCH 2/5] perf evlist: Add evlist__findnew_tracking_event() helper

Currently, intel-bts, intel-pt, and arm-spe may add a dummy event for
tracking to the evlist. We may need to search for the dummy event for
some settings. Therefore, add evlist__findnew_tracking_event() helper.

Signed-off-by: Yang Jihong <[email protected]>
---
tools/perf/builtin-record.c | 11 +++--------
tools/perf/util/evlist.c | 17 +++++++++++++++++
tools/perf/util/evlist.h | 1 +
3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index aec18db7ff23..8872cd037f2c 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1295,14 +1295,9 @@ static int record__open(struct record *rec)
*/
if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
perf_pmus__num_core_pmus() > 1) {
- pos = evlist__get_tracking_event(evlist);
- if (!evsel__is_dummy_event(pos)) {
- /* Set up dummy event. */
- if (evlist__add_dummy(evlist))
- return -ENOMEM;
- pos = evlist__last(evlist);
- evlist__set_tracking_event(evlist, pos);
- }
+ pos = evlist__findnew_tracking_event(evlist);
+ if (!pos)
+ return -ENOMEM;

/*
* Enable the dummy event when the process is forked for
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 7ef43f72098e..4621ddaeb8f3 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1694,6 +1694,23 @@ void evlist__set_tracking_event(struct evlist *evlist, struct evsel *tracking_ev
tracking_evsel->tracking = true;
}

+struct evsel *evlist__findnew_tracking_event(struct evlist *evlist)
+{
+ struct evsel *evsel;
+
+ evsel = evlist__get_tracking_event(evlist);
+ if (!evsel__is_dummy_event(evsel)) {
+ /* Set up dummy event. */
+ if (evlist__add_dummy(evlist))
+ return NULL;
+
+ evsel = evlist__last(evlist);
+ evlist__set_tracking_event(evlist, evsel);
+ }
+
+ return evsel;
+}
+
struct evsel *evlist__find_evsel_by_str(struct evlist *evlist, const char *str)
{
struct evsel *evsel;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 664c6bf7b3e0..4d28c50ba842 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -387,6 +387,7 @@ bool evlist_cpu_iterator__end(const struct evlist_cpu_iterator *evlist_cpu_itr);

struct evsel *evlist__get_tracking_event(struct evlist *evlist);
void evlist__set_tracking_event(struct evlist *evlist, struct evsel *tracking_evsel);
+struct evsel *evlist__findnew_tracking_event(struct evlist *evlist);

struct evsel *evlist__find_evsel_by_str(struct evlist *evlist, const char *str);

--
2.30.GIT


2023-07-04 07:54:35

by Yang Jihong

[permalink] [raw]
Subject: [PATCH 5/5] perf record: All config tracking are integrated into record__config_tracking_events()

The current perf-record also config tracking events in record__open(),
move it to the record__config_tracking_events().

The sys_perf_event_open invoked is as follows:

# perf --debug verbose=3 record -e cpu-clock -D 100 true
<SNIP>
Opening: cpu-clock
------------------------------------------------------------
perf_event_attr:
type 1
size 136
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|ID|PERIOD
read_format ID|LOST
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid 3569 cpu 0 group_fd -1 flags 0x8 = 5
sys_perf_event_open: pid 3569 cpu 1 group_fd -1 flags 0x8 = 6
sys_perf_event_open: pid 3569 cpu 2 group_fd -1 flags 0x8 = 7
sys_perf_event_open: pid 3569 cpu 3 group_fd -1 flags 0x8 = 9
sys_perf_event_open: pid 3569 cpu 4 group_fd -1 flags 0x8 = 10
sys_perf_event_open: pid 3569 cpu 5 group_fd -1 flags 0x8 = 11
sys_perf_event_open: pid 3569 cpu 6 group_fd -1 flags 0x8 = 12
sys_perf_event_open: pid 3569 cpu 7 group_fd -1 flags 0x8 = 13
Opening: dummy:HG
------------------------------------------------------------
perf_event_attr:
type 1
size 136
config 0x9
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|ID|PERIOD
read_format ID|LOST
disabled 1
inherit 1
mmap 1
comm 1
freq 1
enable_on_exec 1
task 1
sample_id_all 1
mmap2 1
comm_exec 1
ksymbol 1
bpf_event 1
------------------------------------------------------------
sys_perf_event_open: pid 3569 cpu 0 group_fd -1 flags 0x8 = 14
sys_perf_event_open: pid 3569 cpu 1 group_fd -1 flags 0x8 = 15
sys_perf_event_open: pid 3569 cpu 2 group_fd -1 flags 0x8 = 16
sys_perf_event_open: pid 3569 cpu 3 group_fd -1 flags 0x8 = 17
sys_perf_event_open: pid 3569 cpu 4 group_fd -1 flags 0x8 = 18
sys_perf_event_open: pid 3569 cpu 5 group_fd -1 flags 0x8 = 19
sys_perf_event_open: pid 3569 cpu 6 group_fd -1 flags 0x8 = 20
sys_perf_event_open: pid 3569 cpu 7 group_fd -1 flags 0x8 = 21
<SNIP>

Signed-off-by: Yang Jihong <[email protected]>
---
tools/perf/builtin-record.c | 46 ++++++++++++++++---------------------
1 file changed, 20 insertions(+), 26 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 69e0d8c75aab..1e21f64e4cfa 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -915,15 +915,31 @@ static int record__config_tracking_events(struct record *rec)
struct record_opts *opts = &rec->opts;

/*
- * User space tasks can migrate between CPUs, so when tracing
- * selected CPUs, sideband for all CPUs is still needed.
+ * For initial_delay, system wide or a hybrid system, we need to add a
+ * dummy event so that we can track PERF_RECORD_MMAP to cover the delay
+ * of waiting or event synthesis.
*/
- if (opts->target.cpu_list) {
+ if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
+ perf_pmus__num_core_pmus() > 1) {
evsel = evlist__findnew_tracking_event(evlist);
if (!evsel)
return -ENOMEM;

- if (!evsel->core.system_wide) {
+ /*
+ * Enable the dummy event when the process is forked for
+ * initial_delay, immediately for system wide.
+ */
+ if (opts->target.initial_delay && !evsel->immediate &&
+ !target__has_cpu(&opts->target))
+ evsel->core.attr.enable_on_exec = 1;
+ else
+ evsel->immediate = 1;
+
+ /*
+ * User space tasks can migrate between CPUs, so when tracing
+ * selected CPUs, sideband for all CPUs is still needed.
+ */
+ if (opts->target.cpu_list && !evsel->core.system_wide) {
evsel->core.system_wide = true;
evsel__set_sample_bit(evsel, TIME);
perf_evlist__propagate_maps(&evlist->core, &evsel->core);
@@ -1313,28 +1329,6 @@ static int record__open(struct record *rec)
struct record_opts *opts = &rec->opts;
int rc = 0;

- /*
- * For initial_delay, system wide or a hybrid system, we need to add a
- * dummy event so that we can track PERF_RECORD_MMAP to cover the delay
- * of waiting or event synthesis.
- */
- if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
- perf_pmus__num_core_pmus() > 1) {
- pos = evlist__findnew_tracking_event(evlist);
- if (!pos)
- return -ENOMEM;
-
- /*
- * Enable the dummy event when the process is forked for
- * initial_delay, immediately for system wide.
- */
- if (opts->target.initial_delay && !pos->immediate &&
- !target__has_cpu(&opts->target))
- pos->core.attr.enable_on_exec = 1;
- else
- pos->immediate = 1;
- }
-
evlist__config(evlist, opts, &callchain_param);

evlist__for_each_entry(evlist, pos) {
--
2.30.GIT


2023-07-04 07:56:21

by Yang Jihong

[permalink] [raw]
Subject: [PATCH 3/5] perf record: Tracking side-band events for all CPUs when tracing selected CPUs

User space tasks can migrate between CPUs, we need to track side-band
events for all CPUs.

The specific scenarios are as follows:

CPU0 CPU1
perf record -C 0 start
taskA starts to be created and executed
-> PERF_RECORD_COMM and PERF_RECORD_MMAP
events only deliver to CPU1
......
|
migrate to CPU0
|
Running on CPU0 <----------/
...

perf record -C 0 stop

Now perf samples the PC of taskA. However, perf does not record the
PERF_RECORD_COMM and PERF_RECORD_COMM events of taskA.
Therefore, the comm and symbols of taskA cannot be parsed.

The sys_perf_event_open invoked is as follows:

# perf --debug verbose=3 record -e cpu-clock -C 1 true
<SNIP>
Opening: cpu-clock
------------------------------------------------------------
perf_event_attr:
type 1
size 136
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|ID|CPU|PERIOD
read_format ID|LOST
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 5
Opening: dummy:HG
------------------------------------------------------------
perf_event_attr:
type 1
size 136
config 0x9
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|ID|CPU|PERIOD
read_format ID|LOST
inherit 1
mmap 1
comm 1
freq 1
task 1
sample_id_all 1
mmap2 1
comm_exec 1
ksymbol 1
bpf_event 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 7
sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 9
sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 10
sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 11
sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 12
sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 13
sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 14
<SNIP>

Signed-off-by: Yang Jihong <[email protected]>
---
tools/perf/builtin-record.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 8872cd037f2c..69e0d8c75aab 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -908,6 +908,31 @@ static int record__config_off_cpu(struct record *rec)
return off_cpu_prepare(rec->evlist, &rec->opts.target, &rec->opts);
}

+static int record__config_tracking_events(struct record *rec)
+{
+ struct evsel *evsel;
+ struct evlist *evlist = rec->evlist;
+ struct record_opts *opts = &rec->opts;
+
+ /*
+ * User space tasks can migrate between CPUs, so when tracing
+ * selected CPUs, sideband for all CPUs is still needed.
+ */
+ if (opts->target.cpu_list) {
+ evsel = evlist__findnew_tracking_event(evlist);
+ if (!evsel)
+ return -ENOMEM;
+
+ if (!evsel->core.system_wide) {
+ evsel->core.system_wide = true;
+ evsel__set_sample_bit(evsel, TIME);
+ perf_evlist__propagate_maps(&evlist->core, &evsel->core);
+ }
+ }
+
+ return 0;
+}
+
static bool record__kcore_readable(struct machine *machine)
{
char kcore[PATH_MAX];
@@ -4235,6 +4260,12 @@ int cmd_record(int argc, const char **argv)
goto out;
}

+ err = record__config_tracking_events(rec);
+ if (err) {
+ pr_err("record__config_tracking_events failed, error %d\n", err);
+ goto out;
+ }
+
err = record__init_thread_masks(rec);
if (err) {
pr_err("Failed to initialize parallel data streaming masks\n");
--
2.30.GIT


2023-07-04 07:57:13

by Yang Jihong

[permalink] [raw]
Subject: [PATCH 4/5] perf test: Add test case for record tracking

Add a new test case to record tracking side-band events for all CPUs when
tracing selected CPUs

Test result:

# ./perf test list 2>&1 | grep 'perf record tracking tests'
95: perf record tracking tests
f# ./perf test 95
95: perf record tracking tests : Ok

Signed-off-by: Yang Jihong <[email protected]>
---
tools/perf/tests/shell/record_tracking.sh | 44 +++++++++++++++++++++++
1 file changed, 44 insertions(+)
create mode 100755 tools/perf/tests/shell/record_tracking.sh

diff --git a/tools/perf/tests/shell/record_tracking.sh b/tools/perf/tests/shell/record_tracking.sh
new file mode 100755
index 000000000000..fe05f4772999
--- /dev/null
+++ b/tools/perf/tests/shell/record_tracking.sh
@@ -0,0 +1,44 @@
+#!/bin/sh
+# perf record tracking tests
+# SPDX-License-Identifier: GPL-2.0
+
+set -e
+
+err=0
+perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX)
+
+can_cpu_wide()
+{
+ if ! perf record -o ${perfdata} -BN --no-bpf-event -e dummy:u -C $1 true 2>&1 >/dev/null
+ then
+ echo "record tracking test [Skipped cannot record cpu$1]"
+ err=2
+ fi
+
+ rm -f ${perfdata}
+ return $err
+}
+
+test_system_wide_tracking()
+{
+ # Need CPU 0 and CPU 1
+ can_cpu_wide 0 || return 0
+ can_cpu_wide 1 || return 0
+
+ # Record on CPU 0 a task running on CPU 1
+ perf record -BN --no-bpf-event -o ${perfdata} -e dummy:u -C 0 -- taskset --cpu-list 1 true
+
+ # Should get MMAP events from CPU 1
+ mmap_cnt=`perf script -i ${perfdata} --show-mmap-events -C 1 2>/dev/null | grep MMAP | wc -l`
+
+ rm -f ${perfdata}
+
+ if [ ${mmap_cnt} -gt 0 ] ; then
+ return 0
+ fi
+
+ echo "Failed to record MMAP events on CPU 1 when tracing CPU 0"
+ return 1
+}
+
+test_system_wide_tracking
--
2.30.GIT


2023-07-04 07:59:35

by Yang Jihong

[permalink] [raw]
Subject: [PATCH 1/5] perf evlist: Export perf_evlist__propagate_maps()

For dummy events that keep tracking, we may need to modify its cpu_maps.
For example, change the cpu_maps to track side-band events for all CPUS.
Export perf_evlist__propagate_maps () to support this scenario.

No functional change.

Signed-off-by: Yang Jihong <[email protected]>
---
tools/lib/perf/evlist.c | 23 +++++++++--------------
tools/lib/perf/include/perf/evlist.h | 2 ++
2 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index b8b066d0dc5e..a3057b692530 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -33,8 +33,8 @@ void perf_evlist__init(struct perf_evlist *evlist)
perf_evlist__reset_id_hash(evlist);
}

-static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
- struct perf_evsel *evsel)
+void perf_evlist__propagate_maps(struct perf_evlist *evlist,
+ struct perf_evsel *evsel)
{
if (evsel->system_wide) {
/* System wide: set the cpu map of the evsel to all online CPUs. */
@@ -78,16 +78,6 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
evlist->all_cpus = perf_cpu_map__merge(evlist->all_cpus, evsel->cpus);
}

-static void perf_evlist__propagate_maps(struct perf_evlist *evlist)
-{
- struct perf_evsel *evsel;
-
- evlist->needs_map_propagation = true;
-
- perf_evlist__for_each_evsel(evlist, evsel)
- __perf_evlist__propagate_maps(evlist, evsel);
-}
-
void perf_evlist__add(struct perf_evlist *evlist,
struct perf_evsel *evsel)
{
@@ -96,7 +86,7 @@ void perf_evlist__add(struct perf_evlist *evlist,
evlist->nr_entries += 1;

if (evlist->needs_map_propagation)
- __perf_evlist__propagate_maps(evlist, evsel);
+ perf_evlist__propagate_maps(evlist, evsel);
}

void perf_evlist__remove(struct perf_evlist *evlist,
@@ -175,6 +165,8 @@ void perf_evlist__set_maps(struct perf_evlist *evlist,
struct perf_cpu_map *cpus,
struct perf_thread_map *threads)
{
+ struct perf_evsel *evsel;
+
/*
* Allow for the possibility that one or another of the maps isn't being
* changed i.e. don't put it. Note we are assuming the maps that are
@@ -192,7 +184,10 @@ void perf_evlist__set_maps(struct perf_evlist *evlist,
evlist->threads = perf_thread_map__get(threads);
}

- perf_evlist__propagate_maps(evlist);
+ evlist->needs_map_propagation = true;
+
+ perf_evlist__for_each_evsel(evlist, evsel)
+ perf_evlist__propagate_maps(evlist, evsel);
}

int perf_evlist__open(struct perf_evlist *evlist)
diff --git a/tools/lib/perf/include/perf/evlist.h b/tools/lib/perf/include/perf/evlist.h
index e894b770779e..d5a2569b2177 100644
--- a/tools/lib/perf/include/perf/evlist.h
+++ b/tools/lib/perf/include/perf/evlist.h
@@ -48,4 +48,6 @@ LIBPERF_API struct perf_mmap *perf_evlist__next_mmap(struct perf_evlist *evlist,

LIBPERF_API void perf_evlist__set_leader(struct perf_evlist *evlist);
LIBPERF_API int perf_evlist__nr_groups(struct perf_evlist *evlist);
+LIBPERF_API void perf_evlist__propagate_maps(struct perf_evlist *evlist,
+ struct perf_evsel *evsel);
#endif /* __LIBPERF_EVLIST_H */
--
2.30.GIT


2023-07-05 21:30:31

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 3/5] perf record: Tracking side-band events for all CPUs when tracing selected CPUs

On Tue, Jul 4, 2023 at 12:44 AM Yang Jihong <[email protected]> wrote:
>
> User space tasks can migrate between CPUs, we need to track side-band
> events for all CPUs.
>
> The specific scenarios are as follows:
>
> CPU0 CPU1
> perf record -C 0 start
> taskA starts to be created and executed
> -> PERF_RECORD_COMM and PERF_RECORD_MMAP
> events only deliver to CPU1
> ......
> |
> migrate to CPU0
> |
> Running on CPU0 <----------/
> ...
>
> perf record -C 0 stop

But I'm curious why you don't limit the task to run on the
specified CPUs only (using taskset).

Also, as you may know, you don't need to specify -C if you
want to profile specific tasks only. It'll open per-cpu, per-task
events and they will have all necessary info.

>
> Now perf samples the PC of taskA. However, perf does not record the
> PERF_RECORD_COMM and PERF_RECORD_COMM events of taskA.

_COMM and _MMAP right?

Thanks,
Namhyung


> Therefore, the comm and symbols of taskA cannot be parsed.
>
> The sys_perf_event_open invoked is as follows:
>
> # perf --debug verbose=3 record -e cpu-clock -C 1 true
> <SNIP>
> Opening: cpu-clock
> ------------------------------------------------------------
> perf_event_attr:
> type 1
> size 136
> { sample_period, sample_freq } 4000
> sample_type IP|TID|TIME|ID|CPU|PERIOD
> read_format ID|LOST
> disabled 1
> inherit 1
> freq 1
> sample_id_all 1
> exclude_guest 1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 5
> Opening: dummy:HG
> ------------------------------------------------------------
> perf_event_attr:
> type 1
> size 136
> config 0x9
> { sample_period, sample_freq } 4000
> sample_type IP|TID|TIME|ID|CPU|PERIOD
> read_format ID|LOST
> inherit 1
> mmap 1
> comm 1
> freq 1
> task 1
> sample_id_all 1
> mmap2 1
> comm_exec 1
> ksymbol 1
> bpf_event 1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6
> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 7
> sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 9
> sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 10
> sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 11
> sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 12
> sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 13
> sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 14
> <SNIP>
>
> Signed-off-by: Yang Jihong <[email protected]>
> ---
> tools/perf/builtin-record.c | 31 +++++++++++++++++++++++++++++++
> 1 file changed, 31 insertions(+)
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 8872cd037f2c..69e0d8c75aab 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -908,6 +908,31 @@ static int record__config_off_cpu(struct record *rec)
> return off_cpu_prepare(rec->evlist, &rec->opts.target, &rec->opts);
> }
>
> +static int record__config_tracking_events(struct record *rec)
> +{
> + struct evsel *evsel;
> + struct evlist *evlist = rec->evlist;
> + struct record_opts *opts = &rec->opts;
> +
> + /*
> + * User space tasks can migrate between CPUs, so when tracing
> + * selected CPUs, sideband for all CPUs is still needed.
> + */
> + if (opts->target.cpu_list) {
> + evsel = evlist__findnew_tracking_event(evlist);
> + if (!evsel)
> + return -ENOMEM;
> +
> + if (!evsel->core.system_wide) {
> + evsel->core.system_wide = true;
> + evsel__set_sample_bit(evsel, TIME);
> + perf_evlist__propagate_maps(&evlist->core, &evsel->core);
> + }
> + }
> +
> + return 0;
> +}
> +
> static bool record__kcore_readable(struct machine *machine)
> {
> char kcore[PATH_MAX];
> @@ -4235,6 +4260,12 @@ int cmd_record(int argc, const char **argv)
> goto out;
> }
>
> + err = record__config_tracking_events(rec);
> + if (err) {
> + pr_err("record__config_tracking_events failed, error %d\n", err);
> + goto out;
> + }
> +
> err = record__init_thread_masks(rec);
> if (err) {
> pr_err("Failed to initialize parallel data streaming masks\n");
> --
> 2.30.GIT
>

2023-07-06 02:00:29

by Yang Jihong

[permalink] [raw]
Subject: Re: [PATCH 3/5] perf record: Tracking side-band events for all CPUs when tracing selected CPUs

Hello,

On 2023/7/6 5:09, Namhyung Kim wrote:
> On Tue, Jul 4, 2023 at 12:44 AM Yang Jihong <[email protected]> wrote:
>>
>> User space tasks can migrate between CPUs, we need to track side-band
>> events for all CPUs.
>>
>> The specific scenarios are as follows:
>>
>> CPU0 CPU1
>> perf record -C 0 start
>> taskA starts to be created and executed
>> -> PERF_RECORD_COMM and PERF_RECORD_MMAP
>> events only deliver to CPU1
>> ......
>> |
>> migrate to CPU0
>> |
>> Running on CPU0 <----------/
>> ...
>>
>> perf record -C 0 stop
>
> But I'm curious why you don't limit the task to run on the
> specified CPUs only (using taskset).
>
> Also, as you may know, you don't need to specify -C if you
> want to profile specific tasks only. It'll open per-cpu, per-task
> events and they will have all necessary info.
>
The actual application scenario is to perform perf records only for
specified cores. However, during sampling, the system may create new
processes and then migrate the processes between cores due to
scheduling. If the processes run on the selected core, In this case, the
perf report cannot parse symbols for these processes.
>>
>> Now perf samples the PC of taskA. However, perf does not record the
>> PERF_RECORD_COMM and PERF_RECORD_COMM events of taskA.
>
> _COMM and _MMAP right?
>
Yes, PERF_RECORD_COMM and PERF_RECORD_MMAP. There's a clerical error here...

Thanks,
Yang

2023-07-11 13:30:37

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH 2/5] perf evlist: Add evlist__findnew_tracking_event() helper

On 4/07/23 10:42, Yang Jihong wrote:
> Currently, intel-bts, intel-pt, and arm-spe may add a dummy event for
> tracking to the evlist. We may need to search for the dummy event for
> some settings. Therefore, add evlist__findnew_tracking_event() helper.
>
> Signed-off-by: Yang Jihong <[email protected]>
> ---
> tools/perf/builtin-record.c | 11 +++--------
> tools/perf/util/evlist.c | 17 +++++++++++++++++
> tools/perf/util/evlist.h | 1 +
> 3 files changed, 21 insertions(+), 8 deletions(-)
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index aec18db7ff23..8872cd037f2c 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1295,14 +1295,9 @@ static int record__open(struct record *rec)
> */
> if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
> perf_pmus__num_core_pmus() > 1) {
> - pos = evlist__get_tracking_event(evlist);
> - if (!evsel__is_dummy_event(pos)) {
> - /* Set up dummy event. */
> - if (evlist__add_dummy(evlist))
> - return -ENOMEM;
> - pos = evlist__last(evlist);
> - evlist__set_tracking_event(evlist, pos);
> - }
> + pos = evlist__findnew_tracking_event(evlist);
> + if (!pos)
> + return -ENOMEM;
>
> /*
> * Enable the dummy event when the process is forked for
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 7ef43f72098e..4621ddaeb8f3 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -1694,6 +1694,23 @@ void evlist__set_tracking_event(struct evlist *evlist, struct evsel *tracking_ev
> tracking_evsel->tracking = true;
> }
>
> +struct evsel *evlist__findnew_tracking_event(struct evlist *evlist)
> +{
> + struct evsel *evsel;
> +
> + evsel = evlist__get_tracking_event(evlist);
> + if (!evsel__is_dummy_event(evsel)) {
> + /* Set up dummy event. */
> + if (evlist__add_dummy(evlist))

evlist__add_dummy() does not exclude_kernel so it
will end up relying on evsel__fallback() to work in
cases where the user does not have kernel access.

evlist__add_aux_dummy() is probably better suited.
Consequently perhaps pass system_wide as
a parameter to evlist__findnew_tracking_event() and
deal with that all inside evlist__findnew_tracking_event()

> + return NULL;
> +
> + evsel = evlist__last(evlist);
> + evlist__set_tracking_event(evlist, evsel);
> + }
> +
> + return evsel;
> +}
> +
> struct evsel *evlist__find_evsel_by_str(struct evlist *evlist, const char *str)
> {
> struct evsel *evsel;
> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> index 664c6bf7b3e0..4d28c50ba842 100644
> --- a/tools/perf/util/evlist.h
> +++ b/tools/perf/util/evlist.h
> @@ -387,6 +387,7 @@ bool evlist_cpu_iterator__end(const struct evlist_cpu_iterator *evlist_cpu_itr);
>
> struct evsel *evlist__get_tracking_event(struct evlist *evlist);
> void evlist__set_tracking_event(struct evlist *evlist, struct evsel *tracking_evsel);
> +struct evsel *evlist__findnew_tracking_event(struct evlist *evlist);
>
> struct evsel *evlist__find_evsel_by_str(struct evlist *evlist, const char *str);
>


2023-07-11 13:36:55

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH 3/5] perf record: Tracking side-band events for all CPUs when tracing selected CPUs

On 4/07/23 10:42, Yang Jihong wrote:
> User space tasks can migrate between CPUs, we need to track side-band
> events for all CPUs.
>
> The specific scenarios are as follows:
>
> CPU0 CPU1
> perf record -C 0 start
> taskA starts to be created and executed
> -> PERF_RECORD_COMM and PERF_RECORD_MMAP
> events only deliver to CPU1
> ......
> |
> migrate to CPU0
> |
> Running on CPU0 <----------/
> ...
>
> perf record -C 0 stop
>
> Now perf samples the PC of taskA. However, perf does not record the
> PERF_RECORD_COMM and PERF_RECORD_COMM events of taskA.
> Therefore, the comm and symbols of taskA cannot be parsed.
>
> The sys_perf_event_open invoked is as follows:
>
> # perf --debug verbose=3 record -e cpu-clock -C 1 true
> <SNIP>
> Opening: cpu-clock
> ------------------------------------------------------------
> perf_event_attr:
> type 1
> size 136
> { sample_period, sample_freq } 4000
> sample_type IP|TID|TIME|ID|CPU|PERIOD
> read_format ID|LOST
> disabled 1
> inherit 1
> freq 1
> sample_id_all 1
> exclude_guest 1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 5
> Opening: dummy:HG
> ------------------------------------------------------------
> perf_event_attr:
> type 1
> size 136
> config 0x9
> { sample_period, sample_freq } 4000
> sample_type IP|TID|TIME|ID|CPU|PERIOD
> read_format ID|LOST
> inherit 1
> mmap 1
> comm 1
> freq 1
> task 1
> sample_id_all 1
> mmap2 1
> comm_exec 1
> ksymbol 1
> bpf_event 1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6
> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 7
> sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 9
> sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 10
> sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 11
> sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 12
> sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 13
> sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 14
> <SNIP>
>
> Signed-off-by: Yang Jihong <[email protected]>
> ---
> tools/perf/builtin-record.c | 31 +++++++++++++++++++++++++++++++
> 1 file changed, 31 insertions(+)
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 8872cd037f2c..69e0d8c75aab 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -908,6 +908,31 @@ static int record__config_off_cpu(struct record *rec)
> return off_cpu_prepare(rec->evlist, &rec->opts.target, &rec->opts);
> }
>
> +static int record__config_tracking_events(struct record *rec)
> +{
> + struct evsel *evsel;
> + struct evlist *evlist = rec->evlist;
> + struct record_opts *opts = &rec->opts;
> +
> + /*
> + * User space tasks can migrate between CPUs, so when tracing
> + * selected CPUs, sideband for all CPUs is still needed.
> + */
> + if (opts->target.cpu_list) {

I am not sure if anyone minds doing this by default, but perhaps
we should say something about it on the perf record man page.

> + evsel = evlist__findnew_tracking_event(evlist);
> + if (!evsel)
> + return -ENOMEM;
> +
> + if (!evsel->core.system_wide) {
> + evsel->core.system_wide = true;
> + evsel__set_sample_bit(evsel, TIME);
> + perf_evlist__propagate_maps(&evlist->core, &evsel->core);
> + }

Perhaps better to export via internel/evsel.h

void perf_evsel__go_system_wide(struct perf_evlist *evlist, struct perf_evsel *evsel)
{
if (!evsel->system_wide) {
evsel->system_wide = true;
if (evlist->needs_map_propagation)
__perf_evlist__propagate_maps(evlist, evsel);
}
}

As suggested in response to patch 2, perhaps deal with system_wide
inside evlist__findnew_tracking_event()

> + }
> +
> + return 0;
> +}
> +
> static bool record__kcore_readable(struct machine *machine)
> {
> char kcore[PATH_MAX];
> @@ -4235,6 +4260,12 @@ int cmd_record(int argc, const char **argv)
> goto out;
> }
>
> + err = record__config_tracking_events(rec);
> + if (err) {
> + pr_err("record__config_tracking_events failed, error %d\n", err);
> + goto out;
> + }
> +
> err = record__init_thread_masks(rec);
> if (err) {
> pr_err("Failed to initialize parallel data streaming masks\n");


2023-07-11 13:38:19

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH 1/5] perf evlist: Export perf_evlist__propagate_maps()

On 4/07/23 10:42, Yang Jihong wrote:
> For dummy events that keep tracking, we may need to modify its cpu_maps.
> For example, change the cpu_maps to track side-band events for all CPUS.
> Export perf_evlist__propagate_maps () to support this scenario.

__perf_evlist__propagate_maps() is quite low-level so it would be better
to avoid exporting it.


2023-07-12 14:47:00

by Yang Jihong

[permalink] [raw]
Subject: Re: [PATCH 1/5] perf evlist: Export perf_evlist__propagate_maps()

Hello,

On 2023/7/11 21:12, Adrian Hunter wrote:
> On 4/07/23 10:42, Yang Jihong wrote:
>> For dummy events that keep tracking, we may need to modify its cpu_maps.
>> For example, change the cpu_maps to track side-band events for all CPUS.
>> Export perf_evlist__propagate_maps () to support this scenario.
>
> __perf_evlist__propagate_maps() is quite low-level so it would be better
> to avoid exporting it.
>
>
Or can we export it via internal/evlist.h?
Because as mentioned in patch 2:

void perf_evsel__go_system_wide(struct perf_evlist *evlist, struct
perf_evsel *evsel)
{
if (!evsel->system_wide) {
evsel->system_wide = true;
if (evlist->needs_map_propagation)
__perf_evlist__propagate_maps(evlist, evsel);
}
}
This interface needs to invoke __perf_evlist__propagate_maps.

Thanks,
Yang

2023-07-12 14:49:16

by Yang Jihong

[permalink] [raw]
Subject: Re: [PATCH 2/5] perf evlist: Add evlist__findnew_tracking_event() helper

Hello,

On 2023/7/11 21:13, Adrian Hunter wrote:
> On 4/07/23 10:42, Yang Jihong wrote:
>> Currently, intel-bts, intel-pt, and arm-spe may add a dummy event for
>> tracking to the evlist. We may need to search for the dummy event for
>> some settings. Therefore, add evlist__findnew_tracking_event() helper.
>>
>> Signed-off-by: Yang Jihong <[email protected]>
>> ---
>> tools/perf/builtin-record.c | 11 +++--------
>> tools/perf/util/evlist.c | 17 +++++++++++++++++
>> tools/perf/util/evlist.h | 1 +
>> 3 files changed, 21 insertions(+), 8 deletions(-)
>>
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index aec18db7ff23..8872cd037f2c 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -1295,14 +1295,9 @@ static int record__open(struct record *rec)
>> */
>> if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
>> perf_pmus__num_core_pmus() > 1) {
>> - pos = evlist__get_tracking_event(evlist);
>> - if (!evsel__is_dummy_event(pos)) {
>> - /* Set up dummy event. */
>> - if (evlist__add_dummy(evlist))
>> - return -ENOMEM;
>> - pos = evlist__last(evlist);
>> - evlist__set_tracking_event(evlist, pos);
>> - }
>> + pos = evlist__findnew_tracking_event(evlist);
>> + if (!pos)
>> + return -ENOMEM;
>>
>> /*
>> * Enable the dummy event when the process is forked for
>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
>> index 7ef43f72098e..4621ddaeb8f3 100644
>> --- a/tools/perf/util/evlist.c
>> +++ b/tools/perf/util/evlist.c
>> @@ -1694,6 +1694,23 @@ void evlist__set_tracking_event(struct evlist *evlist, struct evsel *tracking_ev
>> tracking_evsel->tracking = true;
>> }
>>
>> +struct evsel *evlist__findnew_tracking_event(struct evlist *evlist)
>> +{
>> + struct evsel *evsel;
>> +
>> + evsel = evlist__get_tracking_event(evlist);
>> + if (!evsel__is_dummy_event(evsel)) {
>> + /* Set up dummy event. */
>> + if (evlist__add_dummy(evlist))
>
> evlist__add_dummy() does not exclude_kernel so it
> will end up relying on evsel__fallback() to work in
> cases where the user does not have kernel access.
>
> evlist__add_aux_dummy() is probably better suited.
> Consequently perhaps pass system_wide as
> a parameter to evlist__findnew_tracking_event() and
> deal with that all inside evlist__findnew_tracking_event()
>
OK. These two points will be modified in the next version.

Thanks,
Yang

2023-07-12 15:17:32

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH 1/5] perf evlist: Export perf_evlist__propagate_maps()

On 12/07/23 17:30, Yang Jihong wrote:
> Hello,
>
> On 2023/7/11 21:12, Adrian Hunter wrote:
>> On 4/07/23 10:42, Yang Jihong wrote:
>>> For dummy events that keep tracking, we may need to modify its cpu_maps.
>>> For example, change the cpu_maps to track side-band events for all CPUS.
>>> Export perf_evlist__propagate_maps () to support this scenario.
>>
>> __perf_evlist__propagate_maps() is quite low-level so it would be better
>> to avoid exporting it.
>>
>>
> Or can we export it via internal/evlist.h?
> Because as mentioned in patch 2:
>
> void perf_evsel__go_system_wide(struct perf_evlist *evlist, struct perf_evsel *evsel)
> {
> if (!evsel->system_wide) {
> evsel->system_wide = true;
> if (evlist->needs_map_propagation)
> __perf_evlist__propagate_maps(evlist, evsel);
> }
> }
> This interface needs to invoke __perf_evlist__propagate_maps.

Yes - put it with __perf_evlist__propagate_maps() and export it instead

>
> Thanks,
> Yang


2023-07-12 15:21:18

by Yang Jihong

[permalink] [raw]
Subject: Re: [PATCH 3/5] perf record: Tracking side-band events for all CPUs when tracing selected CPUs

Hello,

On 2023/7/11 21:13, Adrian Hunter wrote:
> On 4/07/23 10:42, Yang Jihong wrote:
>> User space tasks can migrate between CPUs, we need to track side-band
>> events for all CPUs.
>>
>> The specific scenarios are as follows:
>>
>> CPU0 CPU1
>> perf record -C 0 start
>> taskA starts to be created and executed
>> -> PERF_RECORD_COMM and PERF_RECORD_MMAP
>> events only deliver to CPU1
>> ......
>> |
>> migrate to CPU0
>> |
>> Running on CPU0 <----------/
>> ...
>>
>> perf record -C 0 stop
>>
>> Now perf samples the PC of taskA. However, perf does not record the
>> PERF_RECORD_COMM and PERF_RECORD_COMM events of taskA.
>> Therefore, the comm and symbols of taskA cannot be parsed.
>>
>> The sys_perf_event_open invoked is as follows:
>>
>> # perf --debug verbose=3 record -e cpu-clock -C 1 true
>> <SNIP>
>> Opening: cpu-clock
>> ------------------------------------------------------------
>> perf_event_attr:
>> type 1
>> size 136
>> { sample_period, sample_freq } 4000
>> sample_type IP|TID|TIME|ID|CPU|PERIOD
>> read_format ID|LOST
>> disabled 1
>> inherit 1
>> freq 1
>> sample_id_all 1
>> exclude_guest 1
>> ------------------------------------------------------------
>> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 5
>> Opening: dummy:HG
>> ------------------------------------------------------------
>> perf_event_attr:
>> type 1
>> size 136
>> config 0x9
>> { sample_period, sample_freq } 4000
>> sample_type IP|TID|TIME|ID|CPU|PERIOD
>> read_format ID|LOST
>> inherit 1
>> mmap 1
>> comm 1
>> freq 1
>> task 1
>> sample_id_all 1
>> mmap2 1
>> comm_exec 1
>> ksymbol 1
>> bpf_event 1
>> ------------------------------------------------------------
>> sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6
>> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 7
>> sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 9
>> sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 10
>> sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 11
>> sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 12
>> sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 13
>> sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 14
>> <SNIP>
>>
>> Signed-off-by: Yang Jihong <[email protected]>
>> ---
>> tools/perf/builtin-record.c | 31 +++++++++++++++++++++++++++++++
>> 1 file changed, 31 insertions(+)
>>
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index 8872cd037f2c..69e0d8c75aab 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -908,6 +908,31 @@ static int record__config_off_cpu(struct record *rec)
>> return off_cpu_prepare(rec->evlist, &rec->opts.target, &rec->opts);
>> }
>>
>> +static int record__config_tracking_events(struct record *rec)
>> +{
>> + struct evsel *evsel;
>> + struct evlist *evlist = rec->evlist;
>> + struct record_opts *opts = &rec->opts;
>> +
>> + /*
>> + * User space tasks can migrate between CPUs, so when tracing
>> + * selected CPUs, sideband for all CPUs is still needed.
>> + */
>> + if (opts->target.cpu_list) {
>
> I am not sure if anyone minds doing this by default, but perhaps
> we should say something about it on the perf record man page.
>
Okay, will add comments to the man page.

>> + evsel = evlist__findnew_tracking_event(evlist);
>> + if (!evsel)
>> + return -ENOMEM;
>> +
>> + if (!evsel->core.system_wide) {
>> + evsel->core.system_wide = true;
>> + evsel__set_sample_bit(evsel, TIME);
>> + perf_evlist__propagate_maps(&evlist->core, &evsel->core);
>> + }
>
> Perhaps better to export via internel/evsel.h
>
> void perf_evsel__go_system_wide(struct perf_evlist *evlist, struct perf_evsel *evsel)
> {
> if (!evsel->system_wide) {
> evsel->system_wide = true;
> if (evlist->needs_map_propagation)
> __perf_evlist__propagate_maps(evlist, evsel);
> }
> }
>
> As suggested in response to patch 2, perhaps deal with system_wide
> inside evlist__findnew_tracking_event()
>
Okay, I'll modify it as above, so maybe we need to export
perf_evlist__propagate_maps().

As mentioned in the patch 1, __perf_evlist__propagate_maps is low-level
and avoid to export it.
Or can we export perf_evsel__go_system_wide() via through internel/evlist.h?
In this way, we do not need to export perf_evlist__propagate_maps().
If so, would it be more appropriate to call perf_evlist__go_system_wide()?

Thanks,
Yang

2023-07-12 15:23:49

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH 3/5] perf record: Tracking side-band events for all CPUs when tracing selected CPUs

On 12/07/23 17:44, Yang Jihong wrote:
> Hello,
>
> On 2023/7/11 21:13, Adrian Hunter wrote:
>> On 4/07/23 10:42, Yang Jihong wrote:
>>> User space tasks can migrate between CPUs, we need to track side-band
>>> events for all CPUs.
>>>
>>> The specific scenarios are as follows:
>>>
>>>           CPU0                                 CPU1
>>>    perf record -C 0 start
>>>                                taskA starts to be created and executed
>>>                                  -> PERF_RECORD_COMM and PERF_RECORD_MMAP
>>>                                     events only deliver to CPU1
>>>                                ......
>>>                                  |
>>>                            migrate to CPU0
>>>                                  |
>>>    Running on CPU0    <----------/
>>>    ...
>>>
>>>    perf record -C 0 stop
>>>
>>> Now perf samples the PC of taskA. However, perf does not record the
>>> PERF_RECORD_COMM and PERF_RECORD_COMM events of taskA.
>>> Therefore, the comm and symbols of taskA cannot be parsed.
>>>
>>> The sys_perf_event_open invoked is as follows:
>>>
>>>    # perf --debug verbose=3 record -e cpu-clock -C 1 true
>>>    <SNIP>
>>>    Opening: cpu-clock
>>>    ------------------------------------------------------------
>>>    perf_event_attr:
>>>      type                             1
>>>      size                             136
>>>      { sample_period, sample_freq }   4000
>>>      sample_type                      IP|TID|TIME|ID|CPU|PERIOD
>>>      read_format                      ID|LOST
>>>      disabled                         1
>>>      inherit                          1
>>>      freq                             1
>>>      sample_id_all                    1
>>>      exclude_guest                    1
>>>    ------------------------------------------------------------
>>>    sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 5
>>>    Opening: dummy:HG
>>>    ------------------------------------------------------------
>>>    perf_event_attr:
>>>      type                             1
>>>      size                             136
>>>      config                           0x9
>>>      { sample_period, sample_freq }   4000
>>>      sample_type                      IP|TID|TIME|ID|CPU|PERIOD
>>>      read_format                      ID|LOST
>>>      inherit                          1
>>>      mmap                             1
>>>      comm                             1
>>>      freq                             1
>>>      task                             1
>>>      sample_id_all                    1
>>>      mmap2                            1
>>>      comm_exec                        1
>>>      ksymbol                          1
>>>      bpf_event                        1
>>>    ------------------------------------------------------------
>>>    sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 6
>>>    sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 7
>>>    sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 9
>>>    sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 10
>>>    sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 11
>>>    sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 12
>>>    sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 13
>>>    sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 14
>>>    <SNIP>
>>>
>>> Signed-off-by: Yang Jihong <[email protected]>
>>> ---
>>>   tools/perf/builtin-record.c | 31 +++++++++++++++++++++++++++++++
>>>   1 file changed, 31 insertions(+)
>>>
>>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>>> index 8872cd037f2c..69e0d8c75aab 100644
>>> --- a/tools/perf/builtin-record.c
>>> +++ b/tools/perf/builtin-record.c
>>> @@ -908,6 +908,31 @@ static int record__config_off_cpu(struct record *rec)
>>>       return off_cpu_prepare(rec->evlist, &rec->opts.target, &rec->opts);
>>>   }
>>>   +static int record__config_tracking_events(struct record *rec)
>>> +{
>>> +    struct evsel *evsel;
>>> +    struct evlist *evlist = rec->evlist;
>>> +    struct record_opts *opts = &rec->opts;
>>> +
>>> +    /*
>>> +     * User space tasks can migrate between CPUs, so when tracing
>>> +     * selected CPUs, sideband for all CPUs is still needed.
>>> +     */
>>> +    if (opts->target.cpu_list) {
>>
>> I am not sure if anyone minds doing this by default, but perhaps
>> we should say something about it on the perf record man page.
>>
> Okay, will add comments to the man page.
>
>>> +        evsel = evlist__findnew_tracking_event(evlist);
>>> +        if (!evsel)
>>> +            return -ENOMEM;
>>> +
>>> +        if (!evsel->core.system_wide) {
>>> +            evsel->core.system_wide = true;
>>> +            evsel__set_sample_bit(evsel, TIME);
>>> +            perf_evlist__propagate_maps(&evlist->core, &evsel->core);
>>> +        }
>>
>> Perhaps better to export via internel/evsel.h
>>
>> void perf_evsel__go_system_wide(struct perf_evlist *evlist, struct perf_evsel *evsel)
>> {
>>     if (!evsel->system_wide) {
>>         evsel->system_wide = true;
>>         if (evlist->needs_map_propagation)
>>             __perf_evlist__propagate_maps(evlist, evsel);
>>     }
>> }
>>
>> As suggested in response to patch 2, perhaps deal with system_wide
>> inside evlist__findnew_tracking_event()
>>
> Okay, I'll modify it as above, so maybe we need to export perf_evlist__propagate_maps().
>
> As mentioned in the patch 1, __perf_evlist__propagate_maps is low-level and avoid to export it.
> Or can we export perf_evsel__go_system_wide() via through internel/evlist.h?

Yes

> In this way, we do not need to export perf_evlist__propagate_maps().
> If so, would it be more appropriate to call perf_evlist__go_system_wide()?

Sure

>
> Thanks,
> Yang


2023-07-12 21:23:27

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH 4/5] perf test: Add test case for record tracking

On Tue, Jul 4, 2023 at 12:44 AM Yang Jihong <[email protected]> wrote:
>
> Add a new test case to record tracking side-band events for all CPUs when
> tracing selected CPUs

We're using "tracking" and "sideband" as synonyms, I think it would be
clearer to just use "sideband" and not use "tracking". I see both
terms in tools/perf/Documentation/perf-intel-pt.txt

Would it be possible to get some consistency here?

Thanks,
Ian

> Test result:
>
> # ./perf test list 2>&1 | grep 'perf record tracking tests'
> 95: perf record tracking tests
> f# ./perf test 95
> 95: perf record tracking tests : Ok
>
> Signed-off-by: Yang Jihong <[email protected]>
> ---
> tools/perf/tests/shell/record_tracking.sh | 44 +++++++++++++++++++++++
> 1 file changed, 44 insertions(+)
> create mode 100755 tools/perf/tests/shell/record_tracking.sh
>
> diff --git a/tools/perf/tests/shell/record_tracking.sh b/tools/perf/tests/shell/record_tracking.sh
> new file mode 100755
> index 000000000000..fe05f4772999
> --- /dev/null
> +++ b/tools/perf/tests/shell/record_tracking.sh
> @@ -0,0 +1,44 @@
> +#!/bin/sh
> +# perf record tracking tests
> +# SPDX-License-Identifier: GPL-2.0
> +
> +set -e
> +
> +err=0
> +perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX)
> +
> +can_cpu_wide()
> +{
> + if ! perf record -o ${perfdata} -BN --no-bpf-event -e dummy:u -C $1 true 2>&1 >/dev/null
> + then
> + echo "record tracking test [Skipped cannot record cpu$1]"
> + err=2
> + fi
> +
> + rm -f ${perfdata}
> + return $err
> +}
> +
> +test_system_wide_tracking()
> +{
> + # Need CPU 0 and CPU 1
> + can_cpu_wide 0 || return 0
> + can_cpu_wide 1 || return 0
> +
> + # Record on CPU 0 a task running on CPU 1
> + perf record -BN --no-bpf-event -o ${perfdata} -e dummy:u -C 0 -- taskset --cpu-list 1 true
> +
> + # Should get MMAP events from CPU 1
> + mmap_cnt=`perf script -i ${perfdata} --show-mmap-events -C 1 2>/dev/null | grep MMAP | wc -l`
> +
> + rm -f ${perfdata}
> +
> + if [ ${mmap_cnt} -gt 0 ] ; then
> + return 0
> + fi
> +
> + echo "Failed to record MMAP events on CPU 1 when tracing CPU 0"
> + return 1
> +}
> +
> +test_system_wide_tracking
> --
> 2.30.GIT
>

2023-07-13 07:09:55

by Yang Jihong

[permalink] [raw]
Subject: Re: [PATCH 4/5] perf test: Add test case for record tracking

Hello,

On 2023/7/13 4:48, Ian Rogers wrote:
> On Tue, Jul 4, 2023 at 12:44 AM Yang Jihong <[email protected]> wrote:
>>
>> Add a new test case to record tracking side-band events for all CPUs when
>> tracing selected CPUs
>
> We're using "tracking" and "sideband" as synonyms, I think it would be
> clearer to just use "sideband" and not use "tracking". I see both
> terms in tools/perf/Documentation/perf-intel-pt.txt
>
Okay, will use "sideband events" instead of "tracking events" in the
next version.

Thanks,
Yang