2020-04-29 13:13:08

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [RFC PATCHSET v2] Implement --switch-output-event

Hi guys,

Please consider reviewing, this addresses comments by Jiri in
the V1.

Again, the example provided is too simple, using 'perf probe' to
put probes in specific places in some workload to then get any other
event close to the time the trigger hits comes to mind as well, using
the signal was just to reuse the pre-existing logic and keep the
patchkit small.

One other thing that occurred to me while testing is that this
can be combined with 'perf report/perf script' --switch-off option:

$ perf report -h --switch-off

Usage: perf report [<options>]

--switch-off <event>
Stop considering events after the ocurrence of this event

$

To remove from consideration the events that end up being
recorded in the ring buffer after the --switch-output-event but gets in
the ring buffer because we process the --switch-output-event
asynchronously.

Its available at:

git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git perf/switch-output-event

Best regards,

- Arnaldo


Arnaldo Carvalho de Melo (8):
perf record: Move sb_evlist to 'struct record'
perf top: Move sb_evlist to 'struct perf_top'
perf bpf: Decouple creating the evlist from adding the SB event
perf parse-events: Add parse_events_option() variant that creates evlist
perf evlist: Allow reusing the side band thread for more purposes
libsubcmd: Introduce OPT_CALLBACK_SET()
perf record: Introduce --switch-output-event
perf record: Move side band evlist setup to separate routine

tools/lib/subcmd/parse-options.h | 2 +
tools/perf/Documentation/perf-record.txt | 13 ++++
tools/perf/builtin-record.c | 75 ++++++++++++++++++++----
tools/perf/builtin-top.c | 20 +++++--
tools/perf/util/bpf-event.c | 3 +-
tools/perf/util/bpf-event.h | 7 +--
tools/perf/util/evlist.c | 39 +++++++-----
tools/perf/util/evlist.h | 3 +-
tools/perf/util/parse-events.c | 23 ++++++++
tools/perf/util/parse-events.h | 1 +
tools/perf/util/top.h | 2 +-
11 files changed, 151 insertions(+), 37 deletions(-)

--
2.21.1


2020-04-29 13:13:09

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 1/8] perf record: Move sb_evlist to 'struct record'

From: Arnaldo Carvalho de Melo <[email protected]>

Where state related to a 'perf record' session is grouped.

Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/builtin-record.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 2e8011f179f2..2348c4205e59 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -86,6 +86,7 @@ struct record {
struct auxtrace_record *itr;
struct evlist *evlist;
struct perf_session *session;
+ struct evlist *sb_evlist;
int realtime_prio;
bool no_buildid;
bool no_buildid_set;
@@ -1446,7 +1447,6 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
struct perf_data *data = &rec->data;
struct perf_session *session;
bool disabled = false, draining = false;
- struct evlist *sb_evlist = NULL;
int fd;
float ratio = 0;

@@ -1581,9 +1581,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
}

if (!opts->no_bpf_event)
- bpf_event__add_sb_event(&sb_evlist, &session->header.env);
+ bpf_event__add_sb_event(&rec->sb_evlist, &session->header.env);

- if (perf_evlist__start_sb_thread(sb_evlist, &rec->opts.target)) {
+ if (perf_evlist__start_sb_thread(rec->sb_evlist, &rec->opts.target)) {
pr_debug("Couldn't start the BPF side band thread:\nBPF programs starting from now on won't be annotatable\n");
opts->no_bpf_event = true;
}
@@ -1857,7 +1857,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
perf_session__delete(session);

if (!opts->no_bpf_event)
- perf_evlist__stop_sb_thread(sb_evlist);
+ perf_evlist__stop_sb_thread(rec->sb_evlist);
return status;
}

--
2.21.1

2020-04-29 13:13:19

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 2/8] perf top: Move sb_evlist to 'struct perf_top'

From: Arnaldo Carvalho de Melo <[email protected]>

Where state related to a 'perf top' session is grouped.

Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/builtin-top.c | 7 +++----
tools/perf/util/top.h | 2 +-
2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 6b067a5ba1d5..70e1c732db6a 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1580,7 +1580,6 @@ int cmd_top(int argc, const char **argv)
OPTS_EVSWITCH(&top.evswitch),
OPT_END()
};
- struct evlist *sb_evlist = NULL;
const char * const top_usage[] = {
"perf top [<options>]",
NULL
@@ -1744,9 +1743,9 @@ int cmd_top(int argc, const char **argv)
}

if (!top.record_opts.no_bpf_event)
- bpf_event__add_sb_event(&sb_evlist, &perf_env);
+ bpf_event__add_sb_event(&top.sb_evlist, &perf_env);

- if (perf_evlist__start_sb_thread(sb_evlist, target)) {
+ if (perf_evlist__start_sb_thread(top.sb_evlist, target)) {
pr_debug("Couldn't start the BPF side band thread:\nBPF programs starting from now on won't be annotatable\n");
opts->no_bpf_event = true;
}
@@ -1754,7 +1753,7 @@ int cmd_top(int argc, const char **argv)
status = __cmd_top(&top);

if (!opts->no_bpf_event)
- perf_evlist__stop_sb_thread(sb_evlist);
+ perf_evlist__stop_sb_thread(top.sb_evlist);

out_delete_evlist:
evlist__delete(top.evlist);
diff --git a/tools/perf/util/top.h b/tools/perf/util/top.h
index 45dc84ddff37..ff8391208ecd 100644
--- a/tools/perf/util/top.h
+++ b/tools/perf/util/top.h
@@ -18,7 +18,7 @@ struct perf_session;

struct perf_top {
struct perf_tool tool;
- struct evlist *evlist;
+ struct evlist *evlist, *sb_evlist;
struct record_opts record_opts;
struct annotation_options annotation_opts;
struct evswitch evswitch;
--
2.21.1

2020-04-29 13:13:30

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 3/8] perf bpf: Decouple creating the evlist from adding the SB event

From: Arnaldo Carvalho de Melo <[email protected]>

Renaming bpf_event__add_sb_event() to evlist__add_sb_event() and
requiring that the evlist be allocated beforehand.

This will allow using the same side band thread and evlist to be used
for multiple purposes in addition to react to PERF_RECORD_BPF_EVENT soon
after they are generated.

Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/builtin-record.c | 17 ++++++++++++++---
tools/perf/builtin-top.c | 15 +++++++++++++--
tools/perf/util/bpf-event.c | 3 +--
tools/perf/util/bpf-event.h | 7 +++----
tools/perf/util/evlist.c | 17 +++--------------
tools/perf/util/evlist.h | 2 +-
6 files changed, 35 insertions(+), 26 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 2348c4205e59..ed2244847400 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1572,16 +1572,27 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
goto out_child;
}

+ err = -1;
if (!rec->no_buildid
&& !perf_header__has_feat(&session->header, HEADER_BUILD_ID)) {
pr_err("Couldn't generate buildids. "
"Use --no-buildid to profile anyway.\n");
- err = -1;
goto out_child;
}

- if (!opts->no_bpf_event)
- bpf_event__add_sb_event(&rec->sb_evlist, &session->header.env);
+ if (!opts->no_bpf_event) {
+ rec->sb_evlist = evlist__new();
+
+ if (rec->sb_evlist == NULL) {
+ pr_err("Couldn't create side band evlist.\n.");
+ goto out_child;
+ }
+
+ if (evlist__add_bpf_sb_event(rec->sb_evlist, &session->header.env)) {
+ pr_err("Couldn't ask for PERF_RECORD_BPF_EVENT side band events.\n.");
+ goto out_child;
+ }
+ }

if (perf_evlist__start_sb_thread(rec->sb_evlist, &rec->opts.target)) {
pr_debug("Couldn't start the BPF side band thread:\nBPF programs starting from now on won't be annotatable\n");
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 70e1c732db6a..de24aced7213 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1742,8 +1742,19 @@ int cmd_top(int argc, const char **argv)
goto out_delete_evlist;
}

- if (!top.record_opts.no_bpf_event)
- bpf_event__add_sb_event(&top.sb_evlist, &perf_env);
+ if (!top.record_opts.no_bpf_event) {
+ top.sb_evlist = evlist__new();
+
+ if (top.sb_evlist == NULL) {
+ pr_err("Couldn't create side band evlist.\n.");
+ goto out_delete_evlist;
+ }
+
+ if (evlist__add_bpf_sb_event(top.sb_evlist, &perf_env)) {
+ pr_err("Couldn't ask for PERF_RECORD_BPF_EVENT side band events.\n.");
+ goto out_delete_evlist;
+ }
+ }

if (perf_evlist__start_sb_thread(top.sb_evlist, target)) {
pr_debug("Couldn't start the BPF side band thread:\nBPF programs starting from now on won't be annotatable\n");
diff --git a/tools/perf/util/bpf-event.c b/tools/perf/util/bpf-event.c
index 0cd41a862952..3742511a08d1 100644
--- a/tools/perf/util/bpf-event.c
+++ b/tools/perf/util/bpf-event.c
@@ -509,8 +509,7 @@ static int bpf_event__sb_cb(union perf_event *event, void *data)
return 0;
}

-int bpf_event__add_sb_event(struct evlist **evlist,
- struct perf_env *env)
+int evlist__add_bpf_sb_event(struct evlist *evlist, struct perf_env *env)
{
struct perf_event_attr attr = {
.type = PERF_TYPE_SOFTWARE,
diff --git a/tools/perf/util/bpf-event.h b/tools/perf/util/bpf-event.h
index 81fdc88e6c1a..68f315c3df5b 100644
--- a/tools/perf/util/bpf-event.h
+++ b/tools/perf/util/bpf-event.h
@@ -33,8 +33,7 @@ struct btf_node {
#ifdef HAVE_LIBBPF_SUPPORT
int machine__process_bpf(struct machine *machine, union perf_event *event,
struct perf_sample *sample);
-int bpf_event__add_sb_event(struct evlist **evlist,
- struct perf_env *env);
+int evlist__add_bpf_sb_event(struct evlist *evlist, struct perf_env *env);
void bpf_event__print_bpf_prog_info(struct bpf_prog_info *info,
struct perf_env *env,
FILE *fp);
@@ -46,8 +45,8 @@ static inline int machine__process_bpf(struct machine *machine __maybe_unused,
return 0;
}

-static inline int bpf_event__add_sb_event(struct evlist **evlist __maybe_unused,
- struct perf_env *env __maybe_unused)
+static inline int evlist__add_bpf_sb_event(struct evlist *evlist __maybe_unused,
+ struct perf_env *env __maybe_unused)
{
return 0;
}
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 82d9f9bb8975..1d0d36da223b 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1704,38 +1704,27 @@ struct evsel *perf_evlist__reset_weak_group(struct evlist *evsel_list,
return leader;
}

-int perf_evlist__add_sb_event(struct evlist **evlist,
+int perf_evlist__add_sb_event(struct evlist *evlist,
struct perf_event_attr *attr,
perf_evsel__sb_cb_t cb,
void *data)
{
struct evsel *evsel;
- bool new_evlist = (*evlist) == NULL;
-
- if (*evlist == NULL)
- *evlist = evlist__new();
- if (*evlist == NULL)
- return -1;

if (!attr->sample_id_all) {
pr_warning("enabling sample_id_all for all side band events\n");
attr->sample_id_all = 1;
}

- evsel = perf_evsel__new_idx(attr, (*evlist)->core.nr_entries);
+ evsel = perf_evsel__new_idx(attr, evlist->core.nr_entries);
if (!evsel)
goto out_err;

evsel->side_band.cb = cb;
evsel->side_band.data = data;
- evlist__add(*evlist, evsel);
+ evlist__add(evlist, evsel);
return 0;
-
out_err:
- if (new_evlist) {
- evlist__delete(*evlist);
- *evlist = NULL;
- }
return -1;
}

diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index f5bd5c386df1..0f02408fff3e 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -107,7 +107,7 @@ int __perf_evlist__add_default_attrs(struct evlist *evlist,

int perf_evlist__add_dummy(struct evlist *evlist);

-int perf_evlist__add_sb_event(struct evlist **evlist,
+int perf_evlist__add_sb_event(struct evlist *evlist,
struct perf_event_attr *attr,
perf_evsel__sb_cb_t cb,
void *data);
--
2.21.1

2020-04-29 13:13:32

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 4/8] perf parse-events: Add parse_events_option() variant that creates evlist

From: Arnaldo Carvalho de Melo <[email protected]>

For the upcoming --switch-output-event option we want to create the side
band event, populate it with the specified events and then, if it is
present multiple times, go on adding to it, then, if the BPF tracking is
required, use the first event to set its attr.bpf_event to get those
PERF_RECORD_BPF_EVENT metadata events too.

Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/parse-events.c | 23 +++++++++++++++++++++++
tools/perf/util/parse-events.h | 1 +
2 files changed, 24 insertions(+)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 10107747b361..5795f3a8f71c 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -2190,6 +2190,29 @@ int parse_events_option(const struct option *opt, const char *str,
return ret;
}

+int parse_events_option_new_evlist(const struct option *opt, const char *str, int unset)
+{
+ struct evlist **evlistp = opt->value;
+ int ret;
+
+ if (*evlistp == NULL) {
+ *evlistp = evlist__new();
+
+ if (*evlistp == NULL) {
+ fprintf(stderr, "Not enough memory to create evlist\n");
+ return -1;
+ }
+ }
+
+ ret = parse_events_option(opt, str, unset);
+ if (ret) {
+ evlist__delete(*evlistp);
+ *evlistp = NULL;
+ }
+
+ return ret;
+}
+
static int
foreach_evsel_in_last_glob(struct evlist *evlist,
int (*func)(struct evsel *evsel,
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 27596cbd0ba0..6ead9661238c 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -31,6 +31,7 @@ bool have_tracepoints(struct list_head *evlist);
const char *event_type(int type);

int parse_events_option(const struct option *opt, const char *str, int unset);
+int parse_events_option_new_evlist(const struct option *opt, const char *str, int unset);
int parse_events(struct evlist *evlist, const char *str,
struct parse_events_error *error);
int parse_events_terms(struct list_head *terms, const char *str);
--
2.21.1

2020-04-29 13:13:52

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 5/8] perf evlist: Allow reusing the side band thread for more purposes

From: Arnaldo Carvalho de Melo <[email protected]>

I.e. so far we had just one event in that side band thread, a dummy one
with attr.bpf_event set, so that 'perf record' can go ahead and ask the
kernel for further information about BPF programs being loaded.

Allow for more than one event to be there, so that we can use it as
well for the upcoming --switch-output-event feature.

Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/evlist.c | 22 ++++++++++++++++++++++
tools/perf/util/evlist.h | 1 +
2 files changed, 23 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 1d0d36da223b..849058766757 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1777,6 +1777,19 @@ static void *perf_evlist__poll_thread(void *arg)
return NULL;
}

+void evlist__set_cb(struct evlist *evlist, perf_evsel__sb_cb_t cb, void *data)
+{
+ struct evsel *evsel;
+
+ evlist__for_each_entry(evlist, evsel) {
+ evsel->core.attr.sample_id_all = 1;
+ evsel->core.attr.watermark = 1;
+ evsel->core.attr.wakeup_watermark = 1;
+ evsel->side_band.cb = cb;
+ evsel->side_band.data = data;
+ }
+}
+
int perf_evlist__start_sb_thread(struct evlist *evlist,
struct target *target)
{
@@ -1788,6 +1801,15 @@ int perf_evlist__start_sb_thread(struct evlist *evlist,
if (perf_evlist__create_maps(evlist, target))
goto out_delete_evlist;

+ if (evlist->core.nr_entries > 1) {
+ bool can_sample_identifier = perf_can_sample_identifier();
+
+ evlist__for_each_entry(evlist, counter)
+ perf_evsel__set_sample_id(counter, can_sample_identifier);
+
+ perf_evlist__set_id_pos(evlist);
+ }
+
evlist__for_each_entry(evlist, counter) {
if (evsel__open(counter, evlist->core.cpus,
evlist->core.threads) < 0)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 0f02408fff3e..1a8a979ae137 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -111,6 +111,7 @@ int perf_evlist__add_sb_event(struct evlist *evlist,
struct perf_event_attr *attr,
perf_evsel__sb_cb_t cb,
void *data);
+void evlist__set_cb(struct evlist *evlist, perf_evsel__sb_cb_t cb, void *data);
int perf_evlist__start_sb_thread(struct evlist *evlist,
struct target *target);
void perf_evlist__stop_sb_thread(struct evlist *evlist);
--
2.21.1

2020-04-29 13:16:03

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 6/8] libsubcmd: Introduce OPT_CALLBACK_SET()

From: Arnaldo Carvalho de Melo <[email protected]>

To register that an option was set, like with the upcoming 'perf record
--switch-output-option' one.

Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/lib/subcmd/parse-options.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/tools/lib/subcmd/parse-options.h b/tools/lib/subcmd/parse-options.h
index af9def589863..d2414144eb8c 100644
--- a/tools/lib/subcmd/parse-options.h
+++ b/tools/lib/subcmd/parse-options.h
@@ -151,6 +151,8 @@ struct option {
{ .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), .value = (v), .argh = "time", .help = (h), .callback = parse_opt_approxidate_cb }
#define OPT_CALLBACK(s, l, v, a, h, f) \
{ .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), .value = (v), .argh = (a), .help = (h), .callback = (f) }
+#define OPT_CALLBACK_SET(s, l, v, os, a, h, f) \
+ { .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), .value = (v), .argh = (a), .help = (h), .callback = (f), .set = check_vtype(os, bool *)}
#define OPT_CALLBACK_NOOPT(s, l, v, a, h, f) \
{ .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), .value = (v), .argh = (a), .help = (h), .callback = (f), .flags = PARSE_OPT_NOARG }
#define OPT_CALLBACK_DEFAULT(s, l, v, a, h, f, d) \
--
2.21.1

2020-04-29 13:16:03

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 7/8] perf record: Introduce --switch-output-event

From: Arnaldo Carvalho de Melo <[email protected]>

Now we can use it with --overwrite to have a flight recorder mode that
gets snapshot requests from arbitrary events that are processed in the
side band thread together with the PERF_RECORD_BPF_EVENT processing.

Example:

To collect scheduler events until a recvmmsg syscall happens, system
wide:

[root@five a]# rm -f perf.data.2020042717*
[root@five a]# perf record --overwrite -e sched:*switch,syscalls:*recvmmsg --switch-output-event syscalls:sys_enter_recvmmsg
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2020042717585458 ]
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2020042717590235 ]
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2020042717590398 ]
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Dump perf.data.2020042717590511 ]
[ perf record: Captured and wrote 7.244 MB perf.data.<timestamp> ]

So in the above case we had 3 snapshots, the fourth was forced by
control+C:

[root@five a]# ls -la
total 20440
drwxr-xr-x. 2 root root 4096 Apr 27 17:59 .
dr-xr-x---. 12 root root 4096 Apr 27 17:46 ..
-rw-------. 1 root root 3936125 Apr 27 17:58 perf.data.2020042717585458
-rw-------. 1 root root 5074869 Apr 27 17:59 perf.data.2020042717590235
-rw-------. 1 root root 4291037 Apr 27 17:59 perf.data.2020042717590398
-rw-------. 1 root root 7617037 Apr 27 17:59 perf.data.2020042717590511
[root@five a]#

One can make this more precise by adding the switch output event to the
main -e events list, as since this is done asynchronously, a few events
after the signal event will appear in the snapshots, as can be seen
with:

[root@five a]# rm -f perf.data.20200427175*
[root@five a]# perf record --overwrite -e sched:*switch,syscalls:*recvmmsg --switch-output-event syscalls:sys_enter_recvmmsg
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2020042718024203 ]
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2020042718024301 ]
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2020042718024484 ]
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Dump perf.data.2020042718024562 ]
[ perf record: Captured and wrote 7.337 MB perf.data.<timestamp> ]
[root@five a]# perf script -i perf.data.2020042718024203 | tail -15
PacerThread 148586 [005] 122.830729: sched:sched_switch: prev_comm=PacerThread prev_pid=148586...
swapper 0 [000] 122.833588: sched:sched_switch: prev_comm=swapper/0 prev_pid=...
NetworkManager 1251 [000] 122.833619: syscalls:sys_enter_recvmmsg: fd: 0x0000001c, mmsg: 0x7ffe83054a1...
swapper 0 [002] 122.833624: sched:sched_switch: prev_comm=swapper/2 prev_pid=...
swapper 0 [003] 122.833624: sched:sched_switch: prev_comm=swapper/3 prev_pid=...
NetworkManager 1251 [000] 122.833626: syscalls:sys_exit_recvmmsg: 0x1
kworker/3:3-eve 158946 [003] 122.833628: sched:sched_switch: prev_comm=kworker/3:3 prev_pid=15894...
swapper 0 [004] 122.833641: sched:sched_switch: prev_comm=swapper/4 prev_pid=...
NetworkManager 1251 [000] 122.833642: sched:sched_switch: prev_comm=NetworkManage...
perf 228273 [002] 122.833645: sched:sched_switch: prev_comm=perf prev_pid=22827...
swapper 0 [011] 122.833646: sched:sched_switch: prev_comm=swapper/1...
swapper 0 [002] 122.833648: sched:sched_switch: prev_comm=swapper/...
kworker/0:2-eve 207387 [000] 122.833648: sched:sched_switch: prev_comm=kworker/0:2 prev_pid=20738...
kworker/2:3-eve 232038 [002] 122.833652: sched:sched_switch: prev_comm=kworker/2:3 prev_pid=23203...
perf 235825 [003] 122.833653: sched:sched_switch: prev_comm=perf prev_pid=23582...
[root@five a]#

Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/Documentation/perf-record.txt | 13 ++++++++
tools/perf/builtin-record.c | 41 +++++++++++++++++++++---
2 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 6e8b4649307c..561ef55743e2 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -556,6 +556,19 @@ overhead. You can still switch them on with:

--switch-output --no-no-buildid --no-no-buildid-cache

+--switch-output-event::
+Events that will cause the switch of the perf.data file, auto-selecting
+--switch-output=signal, the results are similar as internally the side band
+thread will also send a SIGUSR2 to the main one.
+
+Uses the same syntax as --event, it will just not be recorded, serving only to
+switch the perf.data file as soon as the --switch-output event is processed by
+a separate sideband thread.
+
+This sideband thread is also used to other purposes, like processing the
+PERF_RECORD_BPF_EVENT records as they happen, asking the kernel for extra BPF
+information, etc.
+
--switch-max-files=N::

When rotating perf.data with --switch-output, only keep N files.
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index ed2244847400..7a6a89972691 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -87,7 +87,9 @@ struct record {
struct evlist *evlist;
struct perf_session *session;
struct evlist *sb_evlist;
+ pthread_t thread_id;
int realtime_prio;
+ bool switch_output_event_set;
bool no_buildid;
bool no_buildid_set;
bool no_buildid_cache;
@@ -1436,6 +1438,13 @@ static int record__synthesize(struct record *rec, bool tail)
return err;
}

+static int record__process_signal_event(union perf_event *event __maybe_unused, void *data)
+{
+ struct record *rec = data;
+ pthread_kill(rec->thread_id, SIGUSR2);
+ return 0;
+}
+
static int __cmd_record(struct record *rec, int argc, const char **argv)
{
int err;
@@ -1580,12 +1589,24 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
goto out_child;
}

- if (!opts->no_bpf_event) {
- rec->sb_evlist = evlist__new();
+ if (rec->sb_evlist != NULL) {
+ /*
+ * We get here if --switch-output-event populated the
+ * sb_evlist, so associate a callback that will send a SIGUSR2
+ * to the main thread.
+ */
+ evlist__set_cb(rec->sb_evlist, record__process_signal_event, rec);
+ rec->thread_id = pthread_self();
+ }

+ if (!opts->no_bpf_event) {
if (rec->sb_evlist == NULL) {
- pr_err("Couldn't create side band evlist.\n.");
- goto out_child;
+ rec->sb_evlist = evlist__new();
+
+ if (rec->sb_evlist == NULL) {
+ pr_err("Couldn't create side band evlist.\n.");
+ goto out_child;
+ }
}

if (evlist__add_bpf_sb_event(rec->sb_evlist, &session->header.env)) {
@@ -2179,10 +2200,19 @@ static int switch_output_setup(struct record *rec)
};
unsigned long val;

+ /*
+ * If we're using --switch-output-events, then we imply its
+ * --switch-output=signal, as we'll send a SIGUSR2 from the side band
+ * thread to its parent.
+ */
+ if (rec->switch_output_event_set)
+ goto do_signal;
+
if (!s->set)
return 0;

if (!strcmp(s->str, "signal")) {
+do_signal:
s->signal = true;
pr_debug("switch-output with SIGUSR2 signal\n");
goto enabled;
@@ -2440,6 +2470,9 @@ static struct option __record_options[] = {
&record.switch_output.set, "signal or size[BKMG] or time[smhd]",
"Switch output when receiving SIGUSR2 (signal) or cross a size or time threshold",
"signal"),
+ OPT_CALLBACK_SET(0, "switch-output-event", &record.sb_evlist, &record.switch_output_event_set, "switch output event",
+ "switch output event selector. use 'perf list' to list available events",
+ parse_events_option_new_evlist),
OPT_INTEGER(0, "switch-max-files", &record.switch_output.num_files,
"Limit number of switch output generated files"),
OPT_BOOLEAN(0, "dry-run", &dry_run,
--
2.21.1

2020-04-29 13:16:28

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 8/8] perf record: Move side band evlist setup to separate routine

From: Arnaldo Carvalho de Melo <[email protected]>

It is quite big by now, move that code to a separate
record__setup_sb_evlist() routine.

Suggested-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/builtin-record.c | 71 +++++++++++++++++++++----------------
1 file changed, 41 insertions(+), 30 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 7a6a89972691..bb3d30616bf3 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1445,6 +1445,44 @@ static int record__process_signal_event(union perf_event *event __maybe_unused,
return 0;
}

+static int record__setup_sb_evlist(struct record *rec)
+{
+ struct record_opts *opts = &rec->opts;
+
+ if (rec->sb_evlist != NULL) {
+ /*
+ * We get here if --switch-output-event populated the
+ * sb_evlist, so associate a callback that will send a SIGUSR2
+ * to the main thread.
+ */
+ evlist__set_cb(rec->sb_evlist, record__process_signal_event, rec);
+ rec->thread_id = pthread_self();
+ }
+
+ if (!opts->no_bpf_event) {
+ if (rec->sb_evlist == NULL) {
+ rec->sb_evlist = evlist__new();
+
+ if (rec->sb_evlist == NULL) {
+ pr_err("Couldn't create side band evlist.\n.");
+ return -1;
+ }
+ }
+
+ if (evlist__add_bpf_sb_event(rec->sb_evlist, &rec->session->header.env)) {
+ pr_err("Couldn't ask for PERF_RECORD_BPF_EVENT side band events.\n.");
+ return -1;
+ }
+ }
+
+ if (perf_evlist__start_sb_thread(rec->sb_evlist, &rec->opts.target)) {
+ pr_debug("Couldn't start the BPF side band thread:\nBPF programs starting from now on won't be annotatable\n");
+ opts->no_bpf_event = true;
+ }
+
+ return 0;
+}
+
static int __cmd_record(struct record *rec, int argc, const char **argv)
{
int err;
@@ -1589,36 +1627,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
goto out_child;
}

- if (rec->sb_evlist != NULL) {
- /*
- * We get here if --switch-output-event populated the
- * sb_evlist, so associate a callback that will send a SIGUSR2
- * to the main thread.
- */
- evlist__set_cb(rec->sb_evlist, record__process_signal_event, rec);
- rec->thread_id = pthread_self();
- }
-
- if (!opts->no_bpf_event) {
- if (rec->sb_evlist == NULL) {
- rec->sb_evlist = evlist__new();
-
- if (rec->sb_evlist == NULL) {
- pr_err("Couldn't create side band evlist.\n.");
- goto out_child;
- }
- }
-
- if (evlist__add_bpf_sb_event(rec->sb_evlist, &session->header.env)) {
- pr_err("Couldn't ask for PERF_RECORD_BPF_EVENT side band events.\n.");
- goto out_child;
- }
- }
-
- if (perf_evlist__start_sb_thread(rec->sb_evlist, &rec->opts.target)) {
- pr_debug("Couldn't start the BPF side band thread:\nBPF programs starting from now on won't be annotatable\n");
- opts->no_bpf_event = true;
- }
+ err = record__setup_sb_evlist(rec);
+ if (err)
+ goto out_child;

err = record__synthesize(rec, false);
if (err < 0)
--
2.21.1

2020-04-29 17:30:40

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH 7/8] perf record: Introduce --switch-output-event

On Wed, Apr 29, 2020 at 6:14 AM Arnaldo Carvalho de Melo
<[email protected]> wrote:
>
> From: Arnaldo Carvalho de Melo <[email protected]>
>
> Now we can use it with --overwrite to have a flight recorder mode that
> gets snapshot requests from arbitrary events that are processed in the
> side band thread together with the PERF_RECORD_BPF_EVENT processing.
>
> Example:
>
> To collect scheduler events until a recvmmsg syscall happens, system
> wide:
>
> [root@five a]# rm -f perf.data.2020042717*
> [root@five a]# perf record --overwrite -e sched:*switch,syscalls:*recvmmsg --switch-output-event syscalls:sys_enter_recvmmsg
> [ perf record: dump data: Woken up 1 times ]
> [ perf record: Dump perf.data.2020042717585458 ]
> [ perf record: dump data: Woken up 1 times ]
> [ perf record: Dump perf.data.2020042717590235 ]
> [ perf record: dump data: Woken up 1 times ]
> [ perf record: Dump perf.data.2020042717590398 ]
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Dump perf.data.2020042717590511 ]
> [ perf record: Captured and wrote 7.244 MB perf.data.<timestamp> ]
>
> So in the above case we had 3 snapshots, the fourth was forced by
> control+C:
>
> [root@five a]# ls -la
> total 20440
> drwxr-xr-x. 2 root root 4096 Apr 27 17:59 .
> dr-xr-x---. 12 root root 4096 Apr 27 17:46 ..
> -rw-------. 1 root root 3936125 Apr 27 17:58 perf.data.2020042717585458
> -rw-------. 1 root root 5074869 Apr 27 17:59 perf.data.2020042717590235
> -rw-------. 1 root root 4291037 Apr 27 17:59 perf.data.2020042717590398
> -rw-------. 1 root root 7617037 Apr 27 17:59 perf.data.2020042717590511
> [root@five a]#
>
> One can make this more precise by adding the switch output event to the
> main -e events list, as since this is done asynchronously, a few events
> after the signal event will appear in the snapshots, as can be seen
> with:
>
> [root@five a]# rm -f perf.data.20200427175*
> [root@five a]# perf record --overwrite -e sched:*switch,syscalls:*recvmmsg --switch-output-event syscalls:sys_enter_recvmmsg
> [ perf record: dump data: Woken up 1 times ]
> [ perf record: Dump perf.data.2020042718024203 ]
> [ perf record: dump data: Woken up 1 times ]
> [ perf record: Dump perf.data.2020042718024301 ]
> [ perf record: dump data: Woken up 1 times ]
> [ perf record: Dump perf.data.2020042718024484 ]
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Dump perf.data.2020042718024562 ]
> [ perf record: Captured and wrote 7.337 MB perf.data.<timestamp> ]
> [root@five a]# perf script -i perf.data.2020042718024203 | tail -15
> PacerThread 148586 [005] 122.830729: sched:sched_switch: prev_comm=PacerThread prev_pid=148586...
> swapper 0 [000] 122.833588: sched:sched_switch: prev_comm=swapper/0 prev_pid=...
> NetworkManager 1251 [000] 122.833619: syscalls:sys_enter_recvmmsg: fd: 0x0000001c, mmsg: 0x7ffe83054a1...
> swapper 0 [002] 122.833624: sched:sched_switch: prev_comm=swapper/2 prev_pid=...
> swapper 0 [003] 122.833624: sched:sched_switch: prev_comm=swapper/3 prev_pid=...
> NetworkManager 1251 [000] 122.833626: syscalls:sys_exit_recvmmsg: 0x1
> kworker/3:3-eve 158946 [003] 122.833628: sched:sched_switch: prev_comm=kworker/3:3 prev_pid=15894...
> swapper 0 [004] 122.833641: sched:sched_switch: prev_comm=swapper/4 prev_pid=...
> NetworkManager 1251 [000] 122.833642: sched:sched_switch: prev_comm=NetworkManage...
> perf 228273 [002] 122.833645: sched:sched_switch: prev_comm=perf prev_pid=22827...
> swapper 0 [011] 122.833646: sched:sched_switch: prev_comm=swapper/1...
> swapper 0 [002] 122.833648: sched:sched_switch: prev_comm=swapper/...
> kworker/0:2-eve 207387 [000] 122.833648: sched:sched_switch: prev_comm=kworker/0:2 prev_pid=20738...
> kworker/2:3-eve 232038 [002] 122.833652: sched:sched_switch: prev_comm=kworker/2:3 prev_pid=23203...
> perf 235825 [003] 122.833653: sched:sched_switch: prev_comm=perf prev_pid=23582...
> [root@five a]#
>
> Cc: Adrian Hunter <[email protected]>
> Cc: Jiri Olsa <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Cc: Song Liu <[email protected]>
> Cc: Wang Nan <[email protected]>
> Link: http://lore.kernel.org/lkml/[email protected]
> Link: http://lore.kernel.org/lkml/[email protected]
> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
> ---
> tools/perf/Documentation/perf-record.txt | 13 ++++++++
> tools/perf/builtin-record.c | 41 +++++++++++++++++++++---
> 2 files changed, 50 insertions(+), 4 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
> index 6e8b4649307c..561ef55743e2 100644
> --- a/tools/perf/Documentation/perf-record.txt
> +++ b/tools/perf/Documentation/perf-record.txt
> @@ -556,6 +556,19 @@ overhead. You can still switch them on with:
>
> --switch-output --no-no-buildid --no-no-buildid-cache
>
> +--switch-output-event::
> +Events that will cause the switch of the perf.data file, auto-selecting
> +--switch-output=signal, the results are similar as internally the side band
> +thread will also send a SIGUSR2 to the main one.

I found this paragraph a little hard, perhaps:
A list of events that when they occur cause the output perf.data file
to be ended and a new one created. The signal event,
--switch-output=signal, is auto-selected as SIGUSR2 is used internally
by the thread monitoring the events.

> +Uses the same syntax as --event, it will just not be recorded, serving only to
> +switch the perf.data file as soon as the --switch-output event is processed by
> +a separate sideband thread.
> +
> +This sideband thread is also used to other purposes, like processing the
> +PERF_RECORD_BPF_EVENT records as they happen, asking the kernel for extra BPF
> +information, etc.
> +
> --switch-max-files=N::
>
> When rotating perf.data with --switch-output, only keep N files.
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index ed2244847400..7a6a89972691 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -87,7 +87,9 @@ struct record {
> struct evlist *evlist;
> struct perf_session *session;
> struct evlist *sb_evlist;
> + pthread_t thread_id;
> int realtime_prio;
> + bool switch_output_event_set;
> bool no_buildid;
> bool no_buildid_set;
> bool no_buildid_cache;
> @@ -1436,6 +1438,13 @@ static int record__synthesize(struct record *rec, bool tail)
> return err;
> }
>
> +static int record__process_signal_event(union perf_event *event __maybe_unused, void *data)
> +{
> + struct record *rec = data;
> + pthread_kill(rec->thread_id, SIGUSR2);
> + return 0;
> +}
> +
> static int __cmd_record(struct record *rec, int argc, const char **argv)
> {
> int err;
> @@ -1580,12 +1589,24 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
> goto out_child;
> }
>
> - if (!opts->no_bpf_event) {
> - rec->sb_evlist = evlist__new();
> + if (rec->sb_evlist != NULL) {
> + /*
> + * We get here if --switch-output-event populated the
> + * sb_evlist, so associate a callback that will send a SIGUSR2
> + * to the main thread.
> + */
> + evlist__set_cb(rec->sb_evlist, record__process_signal_event, rec);
> + rec->thread_id = pthread_self();
> + }
>
> + if (!opts->no_bpf_event) {
> if (rec->sb_evlist == NULL) {
> - pr_err("Couldn't create side band evlist.\n.");
> - goto out_child;
> + rec->sb_evlist = evlist__new();
> +
> + if (rec->sb_evlist == NULL) {
> + pr_err("Couldn't create side band evlist.\n.");
> + goto out_child;
> + }
> }
>
> if (evlist__add_bpf_sb_event(rec->sb_evlist, &session->header.env)) {
> @@ -2179,10 +2200,19 @@ static int switch_output_setup(struct record *rec)
> };
> unsigned long val;
>
> + /*
> + * If we're using --switch-output-events, then we imply its
> + * --switch-output=signal, as we'll send a SIGUSR2 from the side band
> + * thread to its parent.
> + */
> + if (rec->switch_output_event_set)
> + goto do_signal;
> +
> if (!s->set)
> return 0;
>
> if (!strcmp(s->str, "signal")) {
> +do_signal:
> s->signal = true;
> pr_debug("switch-output with SIGUSR2 signal\n");
> goto enabled;
> @@ -2440,6 +2470,9 @@ static struct option __record_options[] = {
> &record.switch_output.set, "signal or size[BKMG] or time[smhd]",
> "Switch output when receiving SIGUSR2 (signal) or cross a size or time threshold",
> "signal"),
> + OPT_CALLBACK_SET(0, "switch-output-event", &record.sb_evlist, &record.switch_output_event_set, "switch output event",
> + "switch output event selector. use 'perf list' to list available events",

Perhaps:
"A list of events, see 'perf list', that when they occur cause the end
of one perf.data file and the creation of another"

Thanks,
Ian

> + parse_events_option_new_evlist),
> OPT_INTEGER(0, "switch-max-files", &record.switch_output.num_files,
> "Limit number of switch output generated files"),
> OPT_BOOLEAN(0, "dry-run", &dry_run,
> --
> 2.21.1
>

2020-04-30 09:08:57

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 3/8] perf bpf: Decouple creating the evlist from adding the SB event

On Wed, Apr 29, 2020 at 10:11:01AM -0300, Arnaldo Carvalho de Melo wrote:

SNIP

> -int perf_evlist__add_sb_event(struct evlist **evlist,
> +int perf_evlist__add_sb_event(struct evlist *evlist,
> struct perf_event_attr *attr,
> perf_evsel__sb_cb_t cb,
> void *data)
> {
> struct evsel *evsel;
> - bool new_evlist = (*evlist) == NULL;
> -
> - if (*evlist == NULL)
> - *evlist = evlist__new();
> - if (*evlist == NULL)
> - return -1;
>
> if (!attr->sample_id_all) {
> pr_warning("enabling sample_id_all for all side band events\n");
> attr->sample_id_all = 1;
> }
>
> - evsel = perf_evsel__new_idx(attr, (*evlist)->core.nr_entries);
> + evsel = perf_evsel__new_idx(attr, evlist->core.nr_entries);
> if (!evsel)
> goto out_err;

we can return -1 right here

jirka

>
> evsel->side_band.cb = cb;
> evsel->side_band.data = data;
> - evlist__add(*evlist, evsel);
> + evlist__add(evlist, evsel);
> return 0;
> -
> out_err:
> - if (new_evlist) {
> - evlist__delete(*evlist);
> - *evlist = NULL;
> - }
> return -1;
> }
>
> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> index f5bd5c386df1..0f02408fff3e 100644
> --- a/tools/perf/util/evlist.h
> +++ b/tools/perf/util/evlist.h
> @@ -107,7 +107,7 @@ int __perf_evlist__add_default_attrs(struct evlist *evlist,
>
> int perf_evlist__add_dummy(struct evlist *evlist);
>
> -int perf_evlist__add_sb_event(struct evlist **evlist,
> +int perf_evlist__add_sb_event(struct evlist *evlist,
> struct perf_event_attr *attr,
> perf_evsel__sb_cb_t cb,
> void *data);
> --
> 2.21.1
>

2020-04-30 09:09:17

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC PATCHSET v2] Implement --switch-output-event

On Wed, Apr 29, 2020 at 10:10:58AM -0300, Arnaldo Carvalho de Melo wrote:
> Hi guys,
>
> Please consider reviewing, this addresses comments by Jiri in
> the V1.

small nit posted, but overall looks good

Acked-by: Jiri Olsa <[email protected]>

jirka

>
> Again, the example provided is too simple, using 'perf probe' to
> put probes in specific places in some workload to then get any other
> event close to the time the trigger hits comes to mind as well, using
> the signal was just to reuse the pre-existing logic and keep the
> patchkit small.
>
> One other thing that occurred to me while testing is that this
> can be combined with 'perf report/perf script' --switch-off option:
>
> $ perf report -h --switch-off
>
> Usage: perf report [<options>]
>
> --switch-off <event>
> Stop considering events after the ocurrence of this event
>
> $
>
> To remove from consideration the events that end up being
> recorded in the ring buffer after the --switch-output-event but gets in
> the ring buffer because we process the --switch-output-event
> asynchronously.
>
> Its available at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git perf/switch-output-event
>
> Best regards,
>
> - Arnaldo
>
>
> Arnaldo Carvalho de Melo (8):
> perf record: Move sb_evlist to 'struct record'
> perf top: Move sb_evlist to 'struct perf_top'
> perf bpf: Decouple creating the evlist from adding the SB event
> perf parse-events: Add parse_events_option() variant that creates evlist
> perf evlist: Allow reusing the side band thread for more purposes
> libsubcmd: Introduce OPT_CALLBACK_SET()
> perf record: Introduce --switch-output-event
> perf record: Move side band evlist setup to separate routine
>
> tools/lib/subcmd/parse-options.h | 2 +
> tools/perf/Documentation/perf-record.txt | 13 ++++
> tools/perf/builtin-record.c | 75 ++++++++++++++++++++----
> tools/perf/builtin-top.c | 20 +++++--
> tools/perf/util/bpf-event.c | 3 +-
> tools/perf/util/bpf-event.h | 7 +--
> tools/perf/util/evlist.c | 39 +++++++-----
> tools/perf/util/evlist.h | 3 +-
> tools/perf/util/parse-events.c | 23 ++++++++
> tools/perf/util/parse-events.h | 1 +
> tools/perf/util/top.h | 2 +-
> 11 files changed, 151 insertions(+), 37 deletions(-)
>
> --
> 2.21.1
>

2020-04-30 13:43:12

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 3/8] perf bpf: Decouple creating the evlist from adding the SB event

Em Thu, Apr 30, 2020 at 11:04:23AM +0200, Jiri Olsa escreveu:
> On Wed, Apr 29, 2020 at 10:11:01AM -0300, Arnaldo Carvalho de Melo wrote:
>
> SNIP
>
> > -int perf_evlist__add_sb_event(struct evlist **evlist,
> > +int perf_evlist__add_sb_event(struct evlist *evlist,
> > struct perf_event_attr *attr,
> > perf_evsel__sb_cb_t cb,
> > void *data)
> > {
> > struct evsel *evsel;
> > - bool new_evlist = (*evlist) == NULL;
> > -
> > - if (*evlist == NULL)
> > - *evlist = evlist__new();
> > - if (*evlist == NULL)
> > - return -1;
> >
> > if (!attr->sample_id_all) {
> > pr_warning("enabling sample_id_all for all side band events\n");
> > attr->sample_id_all = 1;
> > }
> >
> > - evsel = perf_evsel__new_idx(attr, (*evlist)->core.nr_entries);
> > + evsel = perf_evsel__new_idx(attr, evlist->core.nr_entries);
> > if (!evsel)
> > goto out_err;
>
> we can return -1 right here

Yeah, I was just trying to keep the patch minimal, I'll remove the
out_err label and the goto to it and call return directly,

Thanks,

- Arnaldo

> jirka
>
> >
> > evsel->side_band.cb = cb;
> > evsel->side_band.data = data;
> > - evlist__add(*evlist, evsel);
> > + evlist__add(evlist, evsel);
> > return 0;
> > -
> > out_err:
> > - if (new_evlist) {
> > - evlist__delete(*evlist);
> > - *evlist = NULL;
> > - }
> > return -1;
> > }
> >
> > diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> > index f5bd5c386df1..0f02408fff3e 100644
> > --- a/tools/perf/util/evlist.h
> > +++ b/tools/perf/util/evlist.h
> > @@ -107,7 +107,7 @@ int __perf_evlist__add_default_attrs(struct evlist *evlist,
> >
> > int perf_evlist__add_dummy(struct evlist *evlist);
> >
> > -int perf_evlist__add_sb_event(struct evlist **evlist,
> > +int perf_evlist__add_sb_event(struct evlist *evlist,
> > struct perf_event_attr *attr,
> > perf_evsel__sb_cb_t cb,
> > void *data);
> > --
> > 2.21.1
> >
>

--

- Arnaldo

2020-05-01 11:30:00

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 5/8] perf evlist: Allow reusing the side band thread for more purposes

On Wed, Apr 29, 2020 at 10:11:03AM -0300, Arnaldo Carvalho de Melo wrote:
> From: Arnaldo Carvalho de Melo <[email protected]>
>
> I.e. so far we had just one event in that side band thread, a dummy one
> with attr.bpf_event set, so that 'perf record' can go ahead and ask the
> kernel for further information about BPF programs being loaded.
>
> Allow for more than one event to be there, so that we can use it as
> well for the upcoming --switch-output-event feature.
>
> Cc: Adrian Hunter <[email protected]>
> Cc: Jiri Olsa <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Cc: Song Liu <[email protected]>
> Link: http://lore.kernel.org/lkml/[email protected]
> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
> ---
> tools/perf/util/evlist.c | 22 ++++++++++++++++++++++
> tools/perf/util/evlist.h | 1 +
> 2 files changed, 23 insertions(+)
>
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 1d0d36da223b..849058766757 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -1777,6 +1777,19 @@ static void *perf_evlist__poll_thread(void *arg)
> return NULL;
> }
>
> +void evlist__set_cb(struct evlist *evlist, perf_evsel__sb_cb_t cb, void *data)
> +{
> + struct evsel *evsel;
> +
> + evlist__for_each_entry(evlist, evsel) {
> + evsel->core.attr.sample_id_all = 1;
> + evsel->core.attr.watermark = 1;
> + evsel->core.attr.wakeup_watermark = 1;
> + evsel->side_band.cb = cb;
> + evsel->side_band.data = data;
> + }
> +}
> +
> int perf_evlist__start_sb_thread(struct evlist *evlist,
> struct target *target)
> {
> @@ -1788,6 +1801,15 @@ int perf_evlist__start_sb_thread(struct evlist *evlist,
> if (perf_evlist__create_maps(evlist, target))
> goto out_delete_evlist;
>
> + if (evlist->core.nr_entries > 1) {
> + bool can_sample_identifier = perf_can_sample_identifier();

I just found this breaks python, because perf_can_sample_identifier
is defined in util/record.c

19: 'import perf' in python :
--- start ---
test child forked, pid 1808205
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: python/perf.so: undefined symbol: perf_can_sample_identifier
test child finished with -1
---- end ----
'import perf' in python: FAILED!

jirka

2020-05-01 11:33:37

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 5/8] perf evlist: Allow reusing the side band thread for more purposes



On May 1, 2020 8:25:52 AM GMT-03:00, Jiri Olsa <[email protected]> wrote:
>On Wed, Apr 29, 2020 at 10:11:03AM -0300, Arnaldo Carvalho de Melo
>wrote:
>> From: Arnaldo Carvalho de Melo <[email protected]>
>>
>> I.e. so far we had just one event in that side band thread, a dummy
>one
>> with attr.bpf_event set, so that 'perf record' can go ahead and ask
>the
>> kernel for further information about BPF programs being loaded.
>>
>> Allow for more than one event to be there, so that we can use it as
>> well for the upcoming --switch-output-event feature.
>>
>> Cc: Adrian Hunter <[email protected]>
>> Cc: Jiri Olsa <[email protected]>
>> Cc: Namhyung Kim <[email protected]>
>> Cc: Song Liu <[email protected]>
>> Link:
>http://lore.kernel.org/lkml/[email protected]
>> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
>> ---
>> tools/perf/util/evlist.c | 22 ++++++++++++++++++++++
>> tools/perf/util/evlist.h | 1 +
>> 2 files changed, 23 insertions(+)
>>
>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
>> index 1d0d36da223b..849058766757 100644
>> --- a/tools/perf/util/evlist.c
>> +++ b/tools/perf/util/evlist.c
>> @@ -1777,6 +1777,19 @@ static void *perf_evlist__poll_thread(void
>*arg)
>> return NULL;
>> }
>>
>> +void evlist__set_cb(struct evlist *evlist, perf_evsel__sb_cb_t cb,
>void *data)
>> +{
>> + struct evsel *evsel;
>> +
>> + evlist__for_each_entry(evlist, evsel) {
>> + evsel->core.attr.sample_id_all = 1;
>> + evsel->core.attr.watermark = 1;
>> + evsel->core.attr.wakeup_watermark = 1;
>> + evsel->side_band.cb = cb;
>> + evsel->side_band.data = data;
>> + }
>> +}
>> +
>> int perf_evlist__start_sb_thread(struct evlist *evlist,
>> struct target *target)
>> {
>> @@ -1788,6 +1801,15 @@ int perf_evlist__start_sb_thread(struct evlist
>*evlist,
>> if (perf_evlist__create_maps(evlist, target))
>> goto out_delete_evlist;
>>
>> + if (evlist->core.nr_entries > 1) {
>> + bool can_sample_identifier = perf_can_sample_identifier();
>
>I just found this breaks python, because perf_can_sample_identifier
>is defined in util/record.c

Yeah, I noticed it too, will fix

>
> 19: 'import perf' in python :
> --- start ---
> test child forked, pid 1808205
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> ImportError: python/perf.so: undefined symbol:
>perf_can_sample_identifier
> test child finished with -1
> ---- end ----
> 'import perf' in python: FAILED!
>
>jirka

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Subject: [tip: perf/core] perf evlist: Allow reusing the side band thread for more purposes

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 976be84504b8285d43dc890b02ceff432cd0dd4b
Gitweb: https://git.kernel.org/tip/976be84504b8285d43dc890b02ceff432cd0dd4b
Author: Arnaldo Carvalho de Melo <[email protected]>
AuthorDate: Mon, 27 Apr 2020 17:54:27 -03:00
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitterDate: Tue, 05 May 2020 16:35:29 -03:00

perf evlist: Allow reusing the side band thread for more purposes

I.e. so far we had just one event in that side band thread, a dummy one
with attr.bpf_event set, so that 'perf record' can go ahead and ask the
kernel for further information about BPF programs being loaded.

Allow for more than one event to be there, so that we can use it as
well for the upcoming --switch-output-event feature.

Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/evlist.h | 1 +
tools/perf/util/sideband_evlist.c | 23 +++++++++++++++++++++++
2 files changed, 24 insertions(+)

diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index a9d01a1..93de63e 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -111,6 +111,7 @@ int perf_evlist__add_sb_event(struct evlist *evlist,
struct perf_event_attr *attr,
perf_evsel__sb_cb_t cb,
void *data);
+void evlist__set_cb(struct evlist *evlist, perf_evsel__sb_cb_t cb, void *data);
int perf_evlist__start_sb_thread(struct evlist *evlist,
struct target *target);
void perf_evlist__stop_sb_thread(struct evlist *evlist);
diff --git a/tools/perf/util/sideband_evlist.c b/tools/perf/util/sideband_evlist.c
index 073d201..1d6f470 100644
--- a/tools/perf/util/sideband_evlist.c
+++ b/tools/perf/util/sideband_evlist.c
@@ -4,6 +4,7 @@
#include "util/evlist.h"
#include "util/evsel.h"
#include "util/mmap.h"
+#include "util/perf_api_probe.h"
#include <perf/mmap.h>
#include <linux/perf_event.h>
#include <limits.h>
@@ -80,6 +81,19 @@ static void *perf_evlist__poll_thread(void *arg)
return NULL;
}

+void evlist__set_cb(struct evlist *evlist, perf_evsel__sb_cb_t cb, void *data)
+{
+ struct evsel *evsel;
+
+ evlist__for_each_entry(evlist, evsel) {
+ evsel->core.attr.sample_id_all = 1;
+ evsel->core.attr.watermark = 1;
+ evsel->core.attr.wakeup_watermark = 1;
+ evsel->side_band.cb = cb;
+ evsel->side_band.data = data;
+ }
+}
+
int perf_evlist__start_sb_thread(struct evlist *evlist, struct target *target)
{
struct evsel *counter;
@@ -90,6 +104,15 @@ int perf_evlist__start_sb_thread(struct evlist *evlist, struct target *target)
if (perf_evlist__create_maps(evlist, target))
goto out_delete_evlist;

+ if (evlist->core.nr_entries > 1) {
+ bool can_sample_identifier = perf_can_sample_identifier();
+
+ evlist__for_each_entry(evlist, counter)
+ perf_evsel__set_sample_id(counter, can_sample_identifier);
+
+ perf_evlist__set_id_pos(evlist);
+ }
+
evlist__for_each_entry(evlist, counter) {
if (evsel__open(counter, evlist->core.cpus, evlist->core.threads) < 0)
goto out_delete_evlist;

Subject: [tip: perf/core] perf record: Introduce --switch-output-event

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 899e5ffbf246a30986ced9dd48092c408978afc7
Gitweb: https://git.kernel.org/tip/899e5ffbf246a30986ced9dd48092c408978afc7
Author: Arnaldo Carvalho de Melo <[email protected]>
AuthorDate: Mon, 27 Apr 2020 17:56:37 -03:00
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitterDate: Tue, 05 May 2020 16:35:29 -03:00

perf record: Introduce --switch-output-event

Now we can use it with --overwrite to have a flight recorder mode that
gets snapshot requests from arbitrary events that are processed in the
side band thread together with the PERF_RECORD_BPF_EVENT processing.

Example:

To collect scheduler events until a recvmmsg syscall happens, system
wide:

[root@five a]# rm -f perf.data.2020042717*
[root@five a]# perf record --overwrite -e sched:*switch,syscalls:*recvmmsg --switch-output-event syscalls:sys_enter_recvmmsg
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2020042717585458 ]
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2020042717590235 ]
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2020042717590398 ]
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Dump perf.data.2020042717590511 ]
[ perf record: Captured and wrote 7.244 MB perf.data.<timestamp> ]

So in the above case we had 3 snapshots, the fourth was forced by
control+C:

[root@five a]# ls -la
total 20440
drwxr-xr-x. 2 root root 4096 Apr 27 17:59 .
dr-xr-x---. 12 root root 4096 Apr 27 17:46 ..
-rw-------. 1 root root 3936125 Apr 27 17:58 perf.data.2020042717585458
-rw-------. 1 root root 5074869 Apr 27 17:59 perf.data.2020042717590235
-rw-------. 1 root root 4291037 Apr 27 17:59 perf.data.2020042717590398
-rw-------. 1 root root 7617037 Apr 27 17:59 perf.data.2020042717590511
[root@five a]#

One can make this more precise by adding the switch output event to the
main -e events list, as since this is done asynchronously, a few events
after the signal event will appear in the snapshots, as can be seen
with:

[root@five a]# rm -f perf.data.20200427175*
[root@five a]# perf record --overwrite -e sched:*switch,syscalls:*recvmmsg --switch-output-event syscalls:sys_enter_recvmmsg
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2020042718024203 ]
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2020042718024301 ]
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2020042718024484 ]
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Dump perf.data.2020042718024562 ]
[ perf record: Captured and wrote 7.337 MB perf.data.<timestamp> ]
[root@five a]# perf script -i perf.data.2020042718024203 | tail -15
PacerThread 148586 [005] 122.830729: sched:sched_switch: prev_comm=PacerThread prev_pid=148586...
swapper 0 [000] 122.833588: sched:sched_switch: prev_comm=swapper/0 prev_pid=...
NetworkManager 1251 [000] 122.833619: syscalls:sys_enter_recvmmsg: fd: 0x0000001c, mmsg: 0x7ffe83054a1...
swapper 0 [002] 122.833624: sched:sched_switch: prev_comm=swapper/2 prev_pid=...
swapper 0 [003] 122.833624: sched:sched_switch: prev_comm=swapper/3 prev_pid=...
NetworkManager 1251 [000] 122.833626: syscalls:sys_exit_recvmmsg: 0x1
kworker/3:3-eve 158946 [003] 122.833628: sched:sched_switch: prev_comm=kworker/3:3 prev_pid=15894...
swapper 0 [004] 122.833641: sched:sched_switch: prev_comm=swapper/4 prev_pid=...
NetworkManager 1251 [000] 122.833642: sched:sched_switch: prev_comm=NetworkManage...
perf 228273 [002] 122.833645: sched:sched_switch: prev_comm=perf prev_pid=22827...
swapper 0 [011] 122.833646: sched:sched_switch: prev_comm=swapper/1...
swapper 0 [002] 122.833648: sched:sched_switch: prev_comm=swapper/...
kworker/0:2-eve 207387 [000] 122.833648: sched:sched_switch: prev_comm=kworker/0:2 prev_pid=20738...
kworker/2:3-eve 232038 [002] 122.833652: sched:sched_switch: prev_comm=kworker/2:3 prev_pid=23203...
perf 235825 [003] 122.833653: sched:sched_switch: prev_comm=perf prev_pid=23582...
[root@five a]#

Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/Documentation/perf-record.txt | 13 +++++++-
tools/perf/builtin-record.c | 41 ++++++++++++++++++++---
2 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 6e8b464..561ef55 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -556,6 +556,19 @@ overhead. You can still switch them on with:

--switch-output --no-no-buildid --no-no-buildid-cache

+--switch-output-event::
+Events that will cause the switch of the perf.data file, auto-selecting
+--switch-output=signal, the results are similar as internally the side band
+thread will also send a SIGUSR2 to the main one.
+
+Uses the same syntax as --event, it will just not be recorded, serving only to
+switch the perf.data file as soon as the --switch-output event is processed by
+a separate sideband thread.
+
+This sideband thread is also used to other purposes, like processing the
+PERF_RECORD_BPF_EVENT records as they happen, asking the kernel for extra BPF
+information, etc.
+
--switch-max-files=N::

When rotating perf.data with --switch-output, only keep N files.
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 5b6a1d2..bb5b4d2 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -88,7 +88,9 @@ struct record {
struct evlist *evlist;
struct perf_session *session;
struct evlist *sb_evlist;
+ pthread_t thread_id;
int realtime_prio;
+ bool switch_output_event_set;
bool no_buildid;
bool no_buildid_set;
bool no_buildid_cache;
@@ -1437,6 +1439,13 @@ out:
return err;
}

+static int record__process_signal_event(union perf_event *event __maybe_unused, void *data)
+{
+ struct record *rec = data;
+ pthread_kill(rec->thread_id, SIGUSR2);
+ return 0;
+}
+
static int __cmd_record(struct record *rec, int argc, const char **argv)
{
int err;
@@ -1581,12 +1590,24 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
goto out_child;
}

- if (!opts->no_bpf_event) {
- rec->sb_evlist = evlist__new();
+ if (rec->sb_evlist != NULL) {
+ /*
+ * We get here if --switch-output-event populated the
+ * sb_evlist, so associate a callback that will send a SIGUSR2
+ * to the main thread.
+ */
+ evlist__set_cb(rec->sb_evlist, record__process_signal_event, rec);
+ rec->thread_id = pthread_self();
+ }

+ if (!opts->no_bpf_event) {
if (rec->sb_evlist == NULL) {
- pr_err("Couldn't create side band evlist.\n.");
- goto out_child;
+ rec->sb_evlist = evlist__new();
+
+ if (rec->sb_evlist == NULL) {
+ pr_err("Couldn't create side band evlist.\n.");
+ goto out_child;
+ }
}

if (evlist__add_bpf_sb_event(rec->sb_evlist, &session->header.env)) {
@@ -2180,10 +2201,19 @@ static int switch_output_setup(struct record *rec)
};
unsigned long val;

+ /*
+ * If we're using --switch-output-events, then we imply its
+ * --switch-output=signal, as we'll send a SIGUSR2 from the side band
+ * thread to its parent.
+ */
+ if (rec->switch_output_event_set)
+ goto do_signal;
+
if (!s->set)
return 0;

if (!strcmp(s->str, "signal")) {
+do_signal:
s->signal = true;
pr_debug("switch-output with SIGUSR2 signal\n");
goto enabled;
@@ -2441,6 +2471,9 @@ static struct option __record_options[] = {
&record.switch_output.set, "signal or size[BKMG] or time[smhd]",
"Switch output when receiving SIGUSR2 (signal) or cross a size or time threshold",
"signal"),
+ OPT_CALLBACK_SET(0, "switch-output-event", &record.sb_evlist, &record.switch_output_event_set, "switch output event",
+ "switch output event selector. use 'perf list' to list available events",
+ parse_events_option_new_evlist),
OPT_INTEGER(0, "switch-max-files", &record.switch_output.num_files,
"Limit number of switch output generated files"),
OPT_BOOLEAN(0, "dry-run", &dry_run,

Subject: [tip: perf/core] perf top: Move sb_evlist to 'struct perf_top'

The following commit has been merged into the perf/core branch of tip:

Commit-ID: ca6c9c8b107f9788662117587cd24bbb19cea94d
Gitweb: https://git.kernel.org/tip/ca6c9c8b107f9788662117587cd24bbb19cea94d
Author: Arnaldo Carvalho de Melo <[email protected]>
AuthorDate: Fri, 24 Apr 2020 10:40:54 -03:00
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitterDate: Tue, 05 May 2020 16:35:29 -03:00

perf top: Move sb_evlist to 'struct perf_top'

Where state related to a 'perf top' session is grouped.

Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/builtin-top.c | 7 +++----
tools/perf/util/top.h | 2 +-
2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 6b067a5..70e1c73 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1580,7 +1580,6 @@ int cmd_top(int argc, const char **argv)
OPTS_EVSWITCH(&top.evswitch),
OPT_END()
};
- struct evlist *sb_evlist = NULL;
const char * const top_usage[] = {
"perf top [<options>]",
NULL
@@ -1744,9 +1743,9 @@ int cmd_top(int argc, const char **argv)
}

if (!top.record_opts.no_bpf_event)
- bpf_event__add_sb_event(&sb_evlist, &perf_env);
+ bpf_event__add_sb_event(&top.sb_evlist, &perf_env);

- if (perf_evlist__start_sb_thread(sb_evlist, target)) {
+ if (perf_evlist__start_sb_thread(top.sb_evlist, target)) {
pr_debug("Couldn't start the BPF side band thread:\nBPF programs starting from now on won't be annotatable\n");
opts->no_bpf_event = true;
}
@@ -1754,7 +1753,7 @@ int cmd_top(int argc, const char **argv)
status = __cmd_top(&top);

if (!opts->no_bpf_event)
- perf_evlist__stop_sb_thread(sb_evlist);
+ perf_evlist__stop_sb_thread(top.sb_evlist);

out_delete_evlist:
evlist__delete(top.evlist);
diff --git a/tools/perf/util/top.h b/tools/perf/util/top.h
index 45dc84d..ff83912 100644
--- a/tools/perf/util/top.h
+++ b/tools/perf/util/top.h
@@ -18,7 +18,7 @@ struct perf_session;

struct perf_top {
struct perf_tool tool;
- struct evlist *evlist;
+ struct evlist *evlist, *sb_evlist;
struct record_opts record_opts;
struct annotation_options annotation_opts;
struct evswitch evswitch;

Subject: [tip: perf/core] perf parse-events: Add parse_events_option() variant that creates evlist

The following commit has been merged into the perf/core branch of tip:

Commit-ID: d0abbc3ce695437fe83446aef44b2f5ef65a80b9
Gitweb: https://git.kernel.org/tip/d0abbc3ce695437fe83446aef44b2f5ef65a80b9
Author: Arnaldo Carvalho de Melo <[email protected]>
AuthorDate: Mon, 27 Apr 2020 13:58:11 -03:00
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitterDate: Tue, 05 May 2020 16:35:29 -03:00

perf parse-events: Add parse_events_option() variant that creates evlist

For the upcoming --switch-output-event option we want to create the side
band event, populate it with the specified events and then, if it is
present multiple times, go on adding to it, then, if the BPF tracking is
required, use the first event to set its attr.bpf_event to get those
PERF_RECORD_BPF_EVENT metadata events too.

Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/parse-events.c | 23 +++++++++++++++++++++++
tools/perf/util/parse-events.h | 1 +
2 files changed, 24 insertions(+)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 1010774..5795f3a 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -2190,6 +2190,29 @@ int parse_events_option(const struct option *opt, const char *str,
return ret;
}

+int parse_events_option_new_evlist(const struct option *opt, const char *str, int unset)
+{
+ struct evlist **evlistp = opt->value;
+ int ret;
+
+ if (*evlistp == NULL) {
+ *evlistp = evlist__new();
+
+ if (*evlistp == NULL) {
+ fprintf(stderr, "Not enough memory to create evlist\n");
+ return -1;
+ }
+ }
+
+ ret = parse_events_option(opt, str, unset);
+ if (ret) {
+ evlist__delete(*evlistp);
+ *evlistp = NULL;
+ }
+
+ return ret;
+}
+
static int
foreach_evsel_in_last_glob(struct evlist *evlist,
int (*func)(struct evsel *evsel,
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 27596cb..6ead966 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -31,6 +31,7 @@ bool have_tracepoints(struct list_head *evlist);
const char *event_type(int type);

int parse_events_option(const struct option *opt, const char *str, int unset);
+int parse_events_option_new_evlist(const struct option *opt, const char *str, int unset);
int parse_events(struct evlist *evlist, const char *str,
struct parse_events_error *error);
int parse_events_terms(struct list_head *terms, const char *str);

Subject: [tip: perf/core] perf bpf: Decouple creating the evlist from adding the SB event

The following commit has been merged into the perf/core branch of tip:

Commit-ID: b38d85ef49cf6af9d1deaaf01daf0986d47e6c7a
Gitweb: https://git.kernel.org/tip/b38d85ef49cf6af9d1deaaf01daf0986d47e6c7a
Author: Arnaldo Carvalho de Melo <[email protected]>
AuthorDate: Fri, 24 Apr 2020 12:24:51 -03:00
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitterDate: Tue, 05 May 2020 16:35:29 -03:00

perf bpf: Decouple creating the evlist from adding the SB event

Renaming bpf_event__add_sb_event() to evlist__add_sb_event() and
requiring that the evlist be allocated beforehand.

This will allow using the same side band thread and evlist to be used
for multiple purposes in addition to react to PERF_RECORD_BPF_EVENT soon
after they are generated.

Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/builtin-record.c | 17 ++++++++++++++---
tools/perf/builtin-top.c | 15 +++++++++++++--
tools/perf/util/bpf-event.c | 3 +--
tools/perf/util/bpf-event.h | 7 +++----
tools/perf/util/evlist.c | 21 ++++-----------------
tools/perf/util/evlist.h | 2 +-
6 files changed, 36 insertions(+), 29 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index a6d887d..5b6a1d2 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1573,16 +1573,27 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
goto out_child;
}

+ err = -1;
if (!rec->no_buildid
&& !perf_header__has_feat(&session->header, HEADER_BUILD_ID)) {
pr_err("Couldn't generate buildids. "
"Use --no-buildid to profile anyway.\n");
- err = -1;
goto out_child;
}

- if (!opts->no_bpf_event)
- bpf_event__add_sb_event(&rec->sb_evlist, &session->header.env);
+ if (!opts->no_bpf_event) {
+ rec->sb_evlist = evlist__new();
+
+ if (rec->sb_evlist == NULL) {
+ pr_err("Couldn't create side band evlist.\n.");
+ goto out_child;
+ }
+
+ if (evlist__add_bpf_sb_event(rec->sb_evlist, &session->header.env)) {
+ pr_err("Couldn't ask for PERF_RECORD_BPF_EVENT side band events.\n.");
+ goto out_child;
+ }
+ }

if (perf_evlist__start_sb_thread(rec->sb_evlist, &rec->opts.target)) {
pr_debug("Couldn't start the BPF side band thread:\nBPF programs starting from now on won't be annotatable\n");
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 70e1c73..de24ace 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1742,8 +1742,19 @@ int cmd_top(int argc, const char **argv)
goto out_delete_evlist;
}

- if (!top.record_opts.no_bpf_event)
- bpf_event__add_sb_event(&top.sb_evlist, &perf_env);
+ if (!top.record_opts.no_bpf_event) {
+ top.sb_evlist = evlist__new();
+
+ if (top.sb_evlist == NULL) {
+ pr_err("Couldn't create side band evlist.\n.");
+ goto out_delete_evlist;
+ }
+
+ if (evlist__add_bpf_sb_event(top.sb_evlist, &perf_env)) {
+ pr_err("Couldn't ask for PERF_RECORD_BPF_EVENT side band events.\n.");
+ goto out_delete_evlist;
+ }
+ }

if (perf_evlist__start_sb_thread(top.sb_evlist, target)) {
pr_debug("Couldn't start the BPF side band thread:\nBPF programs starting from now on won't be annotatable\n");
diff --git a/tools/perf/util/bpf-event.c b/tools/perf/util/bpf-event.c
index 0cd41a8..3742511 100644
--- a/tools/perf/util/bpf-event.c
+++ b/tools/perf/util/bpf-event.c
@@ -509,8 +509,7 @@ static int bpf_event__sb_cb(union perf_event *event, void *data)
return 0;
}

-int bpf_event__add_sb_event(struct evlist **evlist,
- struct perf_env *env)
+int evlist__add_bpf_sb_event(struct evlist *evlist, struct perf_env *env)
{
struct perf_event_attr attr = {
.type = PERF_TYPE_SOFTWARE,
diff --git a/tools/perf/util/bpf-event.h b/tools/perf/util/bpf-event.h
index 81fdc88..68f315c 100644
--- a/tools/perf/util/bpf-event.h
+++ b/tools/perf/util/bpf-event.h
@@ -33,8 +33,7 @@ struct btf_node {
#ifdef HAVE_LIBBPF_SUPPORT
int machine__process_bpf(struct machine *machine, union perf_event *event,
struct perf_sample *sample);
-int bpf_event__add_sb_event(struct evlist **evlist,
- struct perf_env *env);
+int evlist__add_bpf_sb_event(struct evlist *evlist, struct perf_env *env);
void bpf_event__print_bpf_prog_info(struct bpf_prog_info *info,
struct perf_env *env,
FILE *fp);
@@ -46,8 +45,8 @@ static inline int machine__process_bpf(struct machine *machine __maybe_unused,
return 0;
}

-static inline int bpf_event__add_sb_event(struct evlist **evlist __maybe_unused,
- struct perf_env *env __maybe_unused)
+static inline int evlist__add_bpf_sb_event(struct evlist *evlist __maybe_unused,
+ struct perf_env *env __maybe_unused)
{
return 0;
}
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 3f7e7d5..6fe11f4 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1705,39 +1705,26 @@ struct evsel *perf_evlist__reset_weak_group(struct evlist *evsel_list,
return leader;
}

-int perf_evlist__add_sb_event(struct evlist **evlist,
+int perf_evlist__add_sb_event(struct evlist *evlist,
struct perf_event_attr *attr,
perf_evsel__sb_cb_t cb,
void *data)
{
struct evsel *evsel;
- bool new_evlist = (*evlist) == NULL;
-
- if (*evlist == NULL)
- *evlist = evlist__new();
- if (*evlist == NULL)
- return -1;

if (!attr->sample_id_all) {
pr_warning("enabling sample_id_all for all side band events\n");
attr->sample_id_all = 1;
}

- evsel = perf_evsel__new_idx(attr, (*evlist)->core.nr_entries);
+ evsel = perf_evsel__new_idx(attr, evlist->core.nr_entries);
if (!evsel)
- goto out_err;
+ return -1;

evsel->side_band.cb = cb;
evsel->side_band.data = data;
- evlist__add(*evlist, evsel);
+ evlist__add(evlist, evsel);
return 0;
-
-out_err:
- if (new_evlist) {
- evlist__delete(*evlist);
- *evlist = NULL;
- }
- return -1;
}

static void *perf_evlist__poll_thread(void *arg)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 48622e5..a9d01a1 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -107,7 +107,7 @@ int __perf_evlist__add_default_attrs(struct evlist *evlist,

int perf_evlist__add_dummy(struct evlist *evlist);

-int perf_evlist__add_sb_event(struct evlist **evlist,
+int perf_evlist__add_sb_event(struct evlist *evlist,
struct perf_event_attr *attr,
perf_evsel__sb_cb_t cb,
void *data);

Subject: [tip: perf/core] perf record: Move sb_evlist to 'struct record'

The following commit has been merged into the perf/core branch of tip:

Commit-ID: bc477d7983e345262757568ec27be0395dc2fe73
Gitweb: https://git.kernel.org/tip/bc477d7983e345262757568ec27be0395dc2fe73
Author: Arnaldo Carvalho de Melo <[email protected]>
AuthorDate: Fri, 24 Apr 2020 10:24:04 -03:00
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitterDate: Tue, 05 May 2020 16:35:28 -03:00

perf record: Move sb_evlist to 'struct record'

Where state related to a 'perf record' session is grouped.

Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/builtin-record.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index bf3a6f7..a6d887d 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -87,6 +87,7 @@ struct record {
struct auxtrace_record *itr;
struct evlist *evlist;
struct perf_session *session;
+ struct evlist *sb_evlist;
int realtime_prio;
bool no_buildid;
bool no_buildid_set;
@@ -1447,7 +1448,6 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
struct perf_data *data = &rec->data;
struct perf_session *session;
bool disabled = false, draining = false;
- struct evlist *sb_evlist = NULL;
int fd;
float ratio = 0;

@@ -1582,9 +1582,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
}

if (!opts->no_bpf_event)
- bpf_event__add_sb_event(&sb_evlist, &session->header.env);
+ bpf_event__add_sb_event(&rec->sb_evlist, &session->header.env);

- if (perf_evlist__start_sb_thread(sb_evlist, &rec->opts.target)) {
+ if (perf_evlist__start_sb_thread(rec->sb_evlist, &rec->opts.target)) {
pr_debug("Couldn't start the BPF side band thread:\nBPF programs starting from now on won't be annotatable\n");
opts->no_bpf_event = true;
}
@@ -1858,7 +1858,7 @@ out_delete_session:
perf_session__delete(session);

if (!opts->no_bpf_event)
- perf_evlist__stop_sb_thread(sb_evlist);
+ perf_evlist__stop_sb_thread(rec->sb_evlist);
return status;
}

Subject: [tip: perf/core] perf record: Move side band evlist setup to separate routine

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 23cbb41c939a09a4b51eabacdb1f68af210c084d
Gitweb: https://git.kernel.org/tip/23cbb41c939a09a4b51eabacdb1f68af210c084d
Author: Arnaldo Carvalho de Melo <[email protected]>
AuthorDate: Tue, 28 Apr 2020 14:58:29 -03:00
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitterDate: Tue, 05 May 2020 16:35:29 -03:00

perf record: Move side band evlist setup to separate routine

It is quite big by now, move that code to a separate
record__setup_sb_evlist() routine.

Suggested-by: Jiri Olsa <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/builtin-record.c | 71 ++++++++++++++++++++----------------
1 file changed, 41 insertions(+), 30 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index bb5b4d2..cfb9a69 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1446,6 +1446,44 @@ static int record__process_signal_event(union perf_event *event __maybe_unused,
return 0;
}

+static int record__setup_sb_evlist(struct record *rec)
+{
+ struct record_opts *opts = &rec->opts;
+
+ if (rec->sb_evlist != NULL) {
+ /*
+ * We get here if --switch-output-event populated the
+ * sb_evlist, so associate a callback that will send a SIGUSR2
+ * to the main thread.
+ */
+ evlist__set_cb(rec->sb_evlist, record__process_signal_event, rec);
+ rec->thread_id = pthread_self();
+ }
+
+ if (!opts->no_bpf_event) {
+ if (rec->sb_evlist == NULL) {
+ rec->sb_evlist = evlist__new();
+
+ if (rec->sb_evlist == NULL) {
+ pr_err("Couldn't create side band evlist.\n.");
+ return -1;
+ }
+ }
+
+ if (evlist__add_bpf_sb_event(rec->sb_evlist, &rec->session->header.env)) {
+ pr_err("Couldn't ask for PERF_RECORD_BPF_EVENT side band events.\n.");
+ return -1;
+ }
+ }
+
+ if (perf_evlist__start_sb_thread(rec->sb_evlist, &rec->opts.target)) {
+ pr_debug("Couldn't start the BPF side band thread:\nBPF programs starting from now on won't be annotatable\n");
+ opts->no_bpf_event = true;
+ }
+
+ return 0;
+}
+
static int __cmd_record(struct record *rec, int argc, const char **argv)
{
int err;
@@ -1590,36 +1628,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
goto out_child;
}

- if (rec->sb_evlist != NULL) {
- /*
- * We get here if --switch-output-event populated the
- * sb_evlist, so associate a callback that will send a SIGUSR2
- * to the main thread.
- */
- evlist__set_cb(rec->sb_evlist, record__process_signal_event, rec);
- rec->thread_id = pthread_self();
- }
-
- if (!opts->no_bpf_event) {
- if (rec->sb_evlist == NULL) {
- rec->sb_evlist = evlist__new();
-
- if (rec->sb_evlist == NULL) {
- pr_err("Couldn't create side band evlist.\n.");
- goto out_child;
- }
- }
-
- if (evlist__add_bpf_sb_event(rec->sb_evlist, &session->header.env)) {
- pr_err("Couldn't ask for PERF_RECORD_BPF_EVENT side band events.\n.");
- goto out_child;
- }
- }
-
- if (perf_evlist__start_sb_thread(rec->sb_evlist, &rec->opts.target)) {
- pr_debug("Couldn't start the BPF side band thread:\nBPF programs starting from now on won't be annotatable\n");
- opts->no_bpf_event = true;
- }
+ err = record__setup_sb_evlist(rec);
+ if (err)
+ goto out_child;

err = record__synthesize(rec, false);
if (err < 0)

Subject: [tip: perf/core] libsubcmd: Introduce OPT_CALLBACK_SET()

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 636eb4d001b18922b149f59a9f924a5907d20d39
Gitweb: https://git.kernel.org/tip/636eb4d001b18922b149f59a9f924a5907d20d39
Author: Arnaldo Carvalho de Melo <[email protected]>
AuthorDate: Tue, 28 Apr 2020 09:16:25 -03:00
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitterDate: Tue, 05 May 2020 16:35:29 -03:00

libsubcmd: Introduce OPT_CALLBACK_SET()

To register that an option was set, like with the upcoming 'perf record
--switch-output-option' one.

Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/lib/subcmd/parse-options.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/tools/lib/subcmd/parse-options.h b/tools/lib/subcmd/parse-options.h
index af9def5..d241414 100644
--- a/tools/lib/subcmd/parse-options.h
+++ b/tools/lib/subcmd/parse-options.h
@@ -151,6 +151,8 @@ struct option {
{ .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), .value = (v), .argh = "time", .help = (h), .callback = parse_opt_approxidate_cb }
#define OPT_CALLBACK(s, l, v, a, h, f) \
{ .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), .value = (v), .argh = (a), .help = (h), .callback = (f) }
+#define OPT_CALLBACK_SET(s, l, v, os, a, h, f) \
+ { .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), .value = (v), .argh = (a), .help = (h), .callback = (f), .set = check_vtype(os, bool *)}
#define OPT_CALLBACK_NOOPT(s, l, v, a, h, f) \
{ .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), .value = (v), .argh = (a), .help = (h), .callback = (f), .flags = PARSE_OPT_NOARG }
#define OPT_CALLBACK_DEFAULT(s, l, v, a, h, f, d) \

2020-05-11 15:07:39

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 7/8] perf record: Introduce --switch-output-event

Em Wed, Apr 29, 2020 at 10:25:52AM -0700, Ian Rogers escreveu:
> On Wed, Apr 29, 2020 at 6:14 AM Arnaldo Carvalho de Melo
> <[email protected]> wrote:
> >
> > From: Arnaldo Carvalho de Melo <[email protected]>
> >
> > Now we can use it with --overwrite to have a flight recorder mode that
> > gets snapshot requests from arbitrary events that are processed in the
> > side band thread together with the PERF_RECORD_BPF_EVENT processing.
> >
> > Example:
> >
> > To collect scheduler events until a recvmmsg syscall happens, system
> > wide:
> >
> > [root@five a]# rm -f perf.data.2020042717*
> > [root@five a]# perf record --overwrite -e sched:*switch,syscalls:*recvmmsg --switch-output-event syscalls:sys_enter_recvmmsg
> > [ perf record: dump data: Woken up 1 times ]
> > [ perf record: Dump perf.data.2020042717585458 ]
> > [ perf record: dump data: Woken up 1 times ]
> > [ perf record: Dump perf.data.2020042717590235 ]
> > [ perf record: dump data: Woken up 1 times ]
> > [ perf record: Dump perf.data.2020042717590398 ]
> > ^C[ perf record: Woken up 1 times to write data ]
> > [ perf record: Dump perf.data.2020042717590511 ]
> > [ perf record: Captured and wrote 7.244 MB perf.data.<timestamp> ]
> >
> > So in the above case we had 3 snapshots, the fourth was forced by
> > control+C:
> >
> > [root@five a]# ls -la
> > total 20440
> > drwxr-xr-x. 2 root root 4096 Apr 27 17:59 .
> > dr-xr-x---. 12 root root 4096 Apr 27 17:46 ..
> > -rw-------. 1 root root 3936125 Apr 27 17:58 perf.data.2020042717585458
> > -rw-------. 1 root root 5074869 Apr 27 17:59 perf.data.2020042717590235
> > -rw-------. 1 root root 4291037 Apr 27 17:59 perf.data.2020042717590398
> > -rw-------. 1 root root 7617037 Apr 27 17:59 perf.data.2020042717590511
> > [root@five a]#
> >
> > One can make this more precise by adding the switch output event to the
> > main -e events list, as since this is done asynchronously, a few events
> > after the signal event will appear in the snapshots, as can be seen
> > with:
> >
> > [root@five a]# rm -f perf.data.20200427175*
> > [root@five a]# perf record --overwrite -e sched:*switch,syscalls:*recvmmsg --switch-output-event syscalls:sys_enter_recvmmsg
> > [ perf record: dump data: Woken up 1 times ]
> > [ perf record: Dump perf.data.2020042718024203 ]
> > [ perf record: dump data: Woken up 1 times ]
> > [ perf record: Dump perf.data.2020042718024301 ]
> > [ perf record: dump data: Woken up 1 times ]
> > [ perf record: Dump perf.data.2020042718024484 ]
> > ^C[ perf record: Woken up 1 times to write data ]
> > [ perf record: Dump perf.data.2020042718024562 ]
> > [ perf record: Captured and wrote 7.337 MB perf.data.<timestamp> ]
> > [root@five a]# perf script -i perf.data.2020042718024203 | tail -15
> > PacerThread 148586 [005] 122.830729: sched:sched_switch: prev_comm=PacerThread prev_pid=148586...
> > swapper 0 [000] 122.833588: sched:sched_switch: prev_comm=swapper/0 prev_pid=...
> > NetworkManager 1251 [000] 122.833619: syscalls:sys_enter_recvmmsg: fd: 0x0000001c, mmsg: 0x7ffe83054a1...
> > swapper 0 [002] 122.833624: sched:sched_switch: prev_comm=swapper/2 prev_pid=...
> > swapper 0 [003] 122.833624: sched:sched_switch: prev_comm=swapper/3 prev_pid=...
> > NetworkManager 1251 [000] 122.833626: syscalls:sys_exit_recvmmsg: 0x1
> > kworker/3:3-eve 158946 [003] 122.833628: sched:sched_switch: prev_comm=kworker/3:3 prev_pid=15894...
> > swapper 0 [004] 122.833641: sched:sched_switch: prev_comm=swapper/4 prev_pid=...
> > NetworkManager 1251 [000] 122.833642: sched:sched_switch: prev_comm=NetworkManage...
> > perf 228273 [002] 122.833645: sched:sched_switch: prev_comm=perf prev_pid=22827...
> > swapper 0 [011] 122.833646: sched:sched_switch: prev_comm=swapper/1...
> > swapper 0 [002] 122.833648: sched:sched_switch: prev_comm=swapper/...
> > kworker/0:2-eve 207387 [000] 122.833648: sched:sched_switch: prev_comm=kworker/0:2 prev_pid=20738...
> > kworker/2:3-eve 232038 [002] 122.833652: sched:sched_switch: prev_comm=kworker/2:3 prev_pid=23203...
> > perf 235825 [003] 122.833653: sched:sched_switch: prev_comm=perf prev_pid=23582...
> > [root@five a]#
> >
> > Cc: Adrian Hunter <[email protected]>
> > Cc: Jiri Olsa <[email protected]>
> > Cc: Namhyung Kim <[email protected]>
> > Cc: Song Liu <[email protected]>
> > Cc: Wang Nan <[email protected]>
> > Link: http://lore.kernel.org/lkml/[email protected]
> > Link: http://lore.kernel.org/lkml/[email protected]
> > Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
> > ---
> > tools/perf/Documentation/perf-record.txt | 13 ++++++++
> > tools/perf/builtin-record.c | 41 +++++++++++++++++++++---
> > 2 files changed, 50 insertions(+), 4 deletions(-)
> >
> > diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
> > index 6e8b4649307c..561ef55743e2 100644
> > --- a/tools/perf/Documentation/perf-record.txt
> > +++ b/tools/perf/Documentation/perf-record.txt
> > @@ -556,6 +556,19 @@ overhead. You can still switch them on with:
> >
> > --switch-output --no-no-buildid --no-no-buildid-cache
> >
> > +--switch-output-event::
> > +Events that will cause the switch of the perf.data file, auto-selecting
> > +--switch-output=signal, the results are similar as internally the side band
> > +thread will also send a SIGUSR2 to the main one.
>
> I found this paragraph a little hard, perhaps:
> A list of events that when they occur cause the output perf.data file
> to be ended and a new one created. The signal event,
> --switch-output=signal, is auto-selected as SIGUSR2 is used internally
> by the thread monitoring the events.

Good suggestions (the one above and the one at the end of this message),
can you please put those in a formal patch?

Thanks,

- Arnaldo

> > +Uses the same syntax as --event, it will just not be recorded, serving only to
> > +switch the perf.data file as soon as the --switch-output event is processed by
> > +a separate sideband thread.
> > +
> > +This sideband thread is also used to other purposes, like processing the
> > +PERF_RECORD_BPF_EVENT records as they happen, asking the kernel for extra BPF
> > +information, etc.
> > +
> > --switch-max-files=N::
> >
> > When rotating perf.data with --switch-output, only keep N files.
> > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > index ed2244847400..7a6a89972691 100644
> > --- a/tools/perf/builtin-record.c
> > +++ b/tools/perf/builtin-record.c
> > @@ -87,7 +87,9 @@ struct record {
> > struct evlist *evlist;
> > struct perf_session *session;
> > struct evlist *sb_evlist;
> > + pthread_t thread_id;
> > int realtime_prio;
> > + bool switch_output_event_set;
> > bool no_buildid;
> > bool no_buildid_set;
> > bool no_buildid_cache;
> > @@ -1436,6 +1438,13 @@ static int record__synthesize(struct record *rec, bool tail)
> > return err;
> > }
> >
> > +static int record__process_signal_event(union perf_event *event __maybe_unused, void *data)
> > +{
> > + struct record *rec = data;
> > + pthread_kill(rec->thread_id, SIGUSR2);
> > + return 0;
> > +}
> > +
> > static int __cmd_record(struct record *rec, int argc, const char **argv)
> > {
> > int err;
> > @@ -1580,12 +1589,24 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
> > goto out_child;
> > }
> >
> > - if (!opts->no_bpf_event) {
> > - rec->sb_evlist = evlist__new();
> > + if (rec->sb_evlist != NULL) {
> > + /*
> > + * We get here if --switch-output-event populated the
> > + * sb_evlist, so associate a callback that will send a SIGUSR2
> > + * to the main thread.
> > + */
> > + evlist__set_cb(rec->sb_evlist, record__process_signal_event, rec);
> > + rec->thread_id = pthread_self();
> > + }
> >
> > + if (!opts->no_bpf_event) {
> > if (rec->sb_evlist == NULL) {
> > - pr_err("Couldn't create side band evlist.\n.");
> > - goto out_child;
> > + rec->sb_evlist = evlist__new();
> > +
> > + if (rec->sb_evlist == NULL) {
> > + pr_err("Couldn't create side band evlist.\n.");
> > + goto out_child;
> > + }
> > }
> >
> > if (evlist__add_bpf_sb_event(rec->sb_evlist, &session->header.env)) {
> > @@ -2179,10 +2200,19 @@ static int switch_output_setup(struct record *rec)
> > };
> > unsigned long val;
> >
> > + /*
> > + * If we're using --switch-output-events, then we imply its
> > + * --switch-output=signal, as we'll send a SIGUSR2 from the side band
> > + * thread to its parent.
> > + */
> > + if (rec->switch_output_event_set)
> > + goto do_signal;
> > +
> > if (!s->set)
> > return 0;
> >
> > if (!strcmp(s->str, "signal")) {
> > +do_signal:
> > s->signal = true;
> > pr_debug("switch-output with SIGUSR2 signal\n");
> > goto enabled;
> > @@ -2440,6 +2470,9 @@ static struct option __record_options[] = {
> > &record.switch_output.set, "signal or size[BKMG] or time[smhd]",
> > "Switch output when receiving SIGUSR2 (signal) or cross a size or time threshold",
> > "signal"),
> > + OPT_CALLBACK_SET(0, "switch-output-event", &record.sb_evlist, &record.switch_output_event_set, "switch output event",
> > + "switch output event selector. use 'perf list' to list available events",
>
> Perhaps:
> "A list of events, see 'perf list', that when they occur cause the end
> of one perf.data file and the creation of another"
>
> Thanks,
> Ian
>
> > + parse_events_option_new_evlist),
> > OPT_INTEGER(0, "switch-max-files", &record.switch_output.num_files,
> > "Limit number of switch output generated files"),
> > OPT_BOOLEAN(0, "dry-run", &dry_run,
> > --
> > 2.21.1
> >