2021-03-24 13:09:12

by Nicholas Fraser

[permalink] [raw]
Subject: [PATCH] perf data: export to JSON

This adds preliminary support to dump the contents of a perf.data file to
human-readable JSON.

The "perf data" command currently only supports exporting to Common Trace
Format and it doesn't do symbol resolution among other things. Dumping to JSON
means the data can be trivially parsed by anything without any dependencies
(besides a JSON parser.) We use this to import the data into a tool on Windows
where integrating perf or libbabeltrace is impractical.

The JSON is encoded using some trivial fprintf() commands; there is no
dependency on any JSON library. It currently only outputs samples. Other stuff
like processes and mappings could easily be added as needed. The output is of
course huge but it compresses well enough.

Use it like this:

perf data convert --to-json out.json

Here's what the output looks like:

{
"linux-perf-json-version": 1,
"samples": [
{
"timestamp": 3074717308597,
"pid": 8604,
"tid": 8604,
"comm": "sh",
"callchain": [
{
"ip": "0x7f1e0deb2d36",
"symbol": "__strcmp_avx2",
"dso": "libc-2.33.so"
},
{
"ip": "0x7f1e0dd7f49f",
"symbol": "__gconv_find_transform",
"dso": "libc-2.33.so"
},
{
"ip": "0x7f1e0de0b71c",
"symbol": "__wcsmbs_load_conv",
"dso": "libc-2.33.so"
}
]
},
...
]
}

Signed-off-by: Nicholas Fraser <[email protected]>
---
tools/perf/Documentation/perf-data.txt | 5 +-
tools/perf/builtin-data.c | 39 ++++-
tools/perf/util/Build | 1 +
tools/perf/util/data-convert-json.c | 228 +++++++++++++++++++++++++
tools/perf/util/data-convert-json.h | 9 +
tools/perf/util/data-convert.h | 2 +
6 files changed, 276 insertions(+), 8 deletions(-)
create mode 100644 tools/perf/util/data-convert-json.c
create mode 100644 tools/perf/util/data-convert-json.h

diff --git a/tools/perf/Documentation/perf-data.txt b/tools/perf/Documentation/perf-data.txt
index 726b9bc9e1a7..417bf17e265c 100644
--- a/tools/perf/Documentation/perf-data.txt
+++ b/tools/perf/Documentation/perf-data.txt
@@ -17,7 +17,7 @@ Data file related processing.
COMMANDS
--------
convert::
- Converts perf data file into another format (only CTF [1] format is support by now).
+ Converts perf data file into another format.
It's possible to set data-convert debug variable to get debug messages from conversion,
like:
perf --debug data-convert data convert ...
@@ -27,6 +27,9 @@ OPTIONS for 'convert'
--to-ctf::
Triggers the CTF conversion, specify the path of CTF data directory.

+--to-json::
+ Triggers JSON conversion. Specify the JSON filename to output.
+
--tod::
Convert time to wall clock time.

diff --git a/tools/perf/builtin-data.c b/tools/perf/builtin-data.c
index 8d23b8d6ee8e..64546ba517a5 100644
--- a/tools/perf/builtin-data.c
+++ b/tools/perf/builtin-data.c
@@ -8,6 +8,7 @@
#include <subcmd/parse-options.h>
#include "data-convert.h"
#include "data-convert-bt.h"
+#include "data-convert-json.h"

typedef int (*data_cmd_fn_t)(int argc, const char **argv);

@@ -55,7 +56,8 @@ static const char * const data_convert_usage[] = {

static int cmd_data_convert(int argc, const char **argv)
{
- const char *to_ctf = NULL;
+ const char *to_json = NULL;
+ const char *to_ctf = NULL;
struct perf_data_convert_opts opts = {
.force = false,
.all = false,
@@ -63,6 +65,7 @@ static int cmd_data_convert(int argc, const char **argv)
const struct option options[] = {
OPT_INCR('v', "verbose", &verbose, "be more verbose"),
OPT_STRING('i', "input", &input_name, "file", "input file name"),
+ OPT_STRING(0, "to-json", &to_json, NULL, "Convert to JSON format"),
#ifdef HAVE_LIBBABELTRACE_SUPPORT
OPT_STRING(0, "to-ctf", &to_ctf, NULL, "Convert to CTF format"),
OPT_BOOLEAN(0, "tod", &opts.tod, "Convert time to wall clock time"),
@@ -72,11 +75,6 @@ static int cmd_data_convert(int argc, const char **argv)
OPT_END()
};

-#ifndef HAVE_LIBBABELTRACE_SUPPORT
- pr_err("No conversion support compiled in. perf should be compiled with environment variables LIBBABELTRACE=1 and LIBBABELTRACE_DIR=/path/to/libbabeltrace/\n");
- return -1;
-#endif
-
argc = parse_options(argc, argv, options,
data_convert_usage, 0);
if (argc) {
@@ -84,11 +82,38 @@ static int cmd_data_convert(int argc, const char **argv)
return -1;
}

+ if (to_json && to_ctf) {
+ pr_err("You cannot specify both --to-ctf and --to-json.\n");
+ return -1;
+ }
+ if (!to_json && !to_ctf) {
+ pr_err("You must specify one of --to-ctf or --to-json.\n");
+ return -1;
+ }
+
+ if (to_json) {
+ if (opts.all) {
+ pr_err("--all is currently unsupported for JSON output.\n");
+ return -1;
+ }
+ if (opts.tod) {
+ pr_err("--tod is currently unsupported for JSON output.\n");
+ return -1;
+ }
+ if (opts.force) {
+ pr_err("--force is currently unsupported for JSON output.\n");
+ return -1;
+ }
+ return bt_convert__perf2json(input_name, to_json, &opts);
+ }
+
if (to_ctf) {
#ifdef HAVE_LIBBABELTRACE_SUPPORT
return bt_convert__perf2ctf(input_name, to_ctf, &opts);
#else
- pr_err("The libbabeltrace support is not compiled in.\n");
+ pr_err("The libbabeltrace support is not compiled in. perf should be "
+ "compiled with environment variables LIBBABELTRACE=1 and "
+ "LIBBABELTRACE_DIR=/path/to/libbabeltrace/\n");
return -1;
#endif
}
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index e2563d0154eb..de9ac182b25a 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -163,6 +163,7 @@ perf-$(CONFIG_LIBUNWIND_X86) += libunwind/x86_32.o
perf-$(CONFIG_LIBUNWIND_AARCH64) += libunwind/arm64.o

perf-$(CONFIG_LIBBABELTRACE) += data-convert-bt.o
+perf-y += data-convert-json.o

perf-y += scripting-engines/

diff --git a/tools/perf/util/data-convert-json.c b/tools/perf/util/data-convert-json.c
new file mode 100644
index 000000000000..b19674a9f2b8
--- /dev/null
+++ b/tools/perf/util/data-convert-json.c
@@ -0,0 +1,228 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * JSON export.
+ *
+ * Copyright (C) 2021, CodeWeavers Inc. <[email protected]>
+ */
+
+#include "data-convert-json.h"
+
+#include <unistd.h>
+#include <inttypes.h>
+
+#include "linux/compiler.h"
+#include "linux/err.h"
+#include "util/auxtrace.h"
+#include "util/debug.h"
+#include "util/dso.h"
+#include "util/event.h"
+#include "util/evsel.h"
+#include "util/header.h"
+#include "util/map.h"
+#include "util/session.h"
+#include "util/symbol.h"
+#include "util/thread.h"
+#include "util/tool.h"
+
+struct convert_json {
+ struct perf_tool tool;
+ FILE *out;
+ bool first;
+};
+
+static void output_json_string(FILE *out, const char *s)
+{
+ fputc('"', out);
+ while (*s) {
+ switch (*s) {
+
+ // required escapes with special forms as per RFC 8259
+ case '"': fprintf(out, "\\\""); break;
+ case '\\': fprintf(out, "\\\\"); break;
+ case '/': fprintf(out, "\\/"); break;
+ case '\b': fprintf(out, "\\b"); break;
+ case '\f': fprintf(out, "\\f"); break;
+ case '\n': fprintf(out, "\\n"); break;
+ case '\r': fprintf(out, "\\r"); break;
+ case '\t': fprintf(out, "\\t"); break;
+
+ default:
+ // all other control characters must be escaped by hex code
+ if (*s <= 0x1f) {
+ fprintf(out, "\\u%04x", *s);
+ } else {
+ fputc(*s, out);
+ }
+ break;
+ }
+
+ ++s;
+ }
+ fputc('"', out);
+}
+
+static void output_sample_callchain_entry(struct perf_tool *tool,
+ u64 ip, struct addr_location *al)
+{
+ struct convert_json *c = container_of(tool, struct convert_json, tool);
+ FILE *out = c->out;
+
+ fprintf(out, "\n\t\t\t\t{");
+ fprintf(out, "\n\t\t\t\t\t\"ip\": \"0x%" PRIx64 "\"", ip);
+
+ if (al && al->sym && al->sym->name && strlen(al->sym->name) > 0) {
+ fprintf(out, ",\n\t\t\t\t\t\"symbol\": ");
+ output_json_string(out, al->sym->name);
+
+ if (al->map && al->map->dso) {
+ const char *dso = al->map->dso->short_name;
+ if (dso && strlen(dso) > 0) {
+ fprintf(out, ",\n\t\t\t\t\t\"dso\": ");
+ output_json_string(out, dso);
+ }
+ }
+ }
+
+ fprintf(out, "\n\t\t\t\t}");
+}
+
+static int process_sample_event(struct perf_tool *tool,
+ union perf_event *event __maybe_unused,
+ struct perf_sample *sample,
+ struct evsel *evsel __maybe_unused,
+ struct machine *machine)
+{
+ struct convert_json *c = container_of(tool, struct convert_json, tool);
+ FILE *out = c->out;
+ struct addr_location al, tal;
+ u8 cpumode = PERF_RECORD_MISC_USER;
+
+ if (machine__resolve(machine, &al, sample) < 0) {
+ return 0;
+ }
+
+ if (c->first) {
+ c->first = false;
+ } else {
+ fprintf(out, ",");
+ }
+ fprintf(out, "\n\t\t{");
+
+ fprintf(out, "\n\t\t\t\"timestamp\": %" PRIi64, sample->time);
+ fprintf(out, ",\n\t\t\t\"pid\": %i", al.thread->pid_);
+ fprintf(out, ",\n\t\t\t\"tid\": %i", al.thread->tid);
+
+ if (al.thread->cpu >= 0) {
+ fprintf(out, ",\n\t\t\t\"cpu\": %i", al.thread->cpu);
+ }
+
+ fprintf(out, ",\n\t\t\t\"comm\": ");
+ output_json_string(out, thread__comm_str(al.thread));
+
+ fprintf(out, ",\n\t\t\t\"callchain\": [");
+ if (sample->callchain) {
+ unsigned int i;
+ bool ok;
+ bool first_callchain = true;
+
+ for (i = 0; i < sample->callchain->nr; ++i) {
+ u64 ip = sample->callchain->ips[i];
+
+ if (ip >= PERF_CONTEXT_MAX) {
+ switch (ip) {
+ case PERF_CONTEXT_HV:
+ cpumode = PERF_RECORD_MISC_HYPERVISOR;
+ break;
+ case PERF_CONTEXT_KERNEL:
+ cpumode = PERF_RECORD_MISC_KERNEL;
+ break;
+ case PERF_CONTEXT_USER:
+ cpumode = PERF_RECORD_MISC_USER;
+ break;
+ default:
+ pr_debug("invalid callchain context: "
+ "%"PRId64"\n", (s64) ip);
+ break;
+ }
+ continue;
+ }
+
+ if (first_callchain) {
+ first_callchain = false;
+ } else {
+ fprintf(out, ",");
+ }
+
+ ok = thread__find_symbol(al.thread, cpumode, ip, &tal);
+ output_sample_callchain_entry(tool, ip, ok ? &tal : NULL);
+ }
+ } else {
+ output_sample_callchain_entry(tool, sample->ip, &al);
+ }
+ fprintf(out, "\n\t\t\t]");
+
+ fprintf(out, "\n\t\t}");
+ return 0;
+}
+
+int bt_convert__perf2json(const char *input_name, const char *output_name,
+ struct perf_data_convert_opts *opts __maybe_unused)
+{
+ struct perf_session *session;
+
+ struct convert_json c = {
+ .tool = {
+ .sample = process_sample_event,
+ .mmap = perf_event__process_mmap,
+ .mmap2 = perf_event__process_mmap2,
+ .comm = perf_event__process_comm,
+ .namespaces = perf_event__process_namespaces,
+ .cgroup = perf_event__process_cgroup,
+ .exit = perf_event__process_exit,
+ .fork = perf_event__process_fork,
+ .lost = perf_event__process_lost,
+ .tracing_data = perf_event__process_tracing_data,
+ .build_id = perf_event__process_build_id,
+ .id_index = perf_event__process_id_index,
+ .auxtrace_info = perf_event__process_auxtrace_info,
+ .auxtrace = perf_event__process_auxtrace,
+ .event_update = perf_event__process_event_update,
+ .ordered_events = true,
+ .ordering_requires_timestamps = true,
+ },
+ .first = true,
+ };
+
+ struct perf_data data = {
+ .mode = PERF_DATA_MODE_READ,
+ .path = input_name,
+ };
+
+ c.out = fopen(output_name, "w");
+ if (!c.out) {
+ fprintf(stderr, "error opening output file!\n");
+ return -1;
+ }
+
+ session = perf_session__new(&data, false, &c.tool);
+ if (IS_ERR(session)) {
+ fprintf(stderr, "error creating perf session!\n");
+ return -1;
+ }
+
+ if (symbol__init(&session->header.env) < 0) {
+ fprintf(stderr, "symbol init error!\n");
+ return -1;
+ }
+
+ // Version number for future-proofing. Most additions should be able to be
+ // done in a backwards-compatible way so this should only need to be bumped
+ // if some major breaking change must be made.
+ fprintf(c.out, "{\n\t\"linux-perf-json-version\": 1,");
+
+ fprintf(c.out, "\n\t\"samples\": [");
+ perf_session__process_events(session);
+ fprintf(c.out, "\n\t]\n}\n");
+
+ return 0;
+}
diff --git a/tools/perf/util/data-convert-json.h b/tools/perf/util/data-convert-json.h
new file mode 100644
index 000000000000..1fcac5ce3ec1
--- /dev/null
+++ b/tools/perf/util/data-convert-json.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __DATA_CONVERT_JSON_H
+#define __DATA_CONVERT_JSON_H
+#include "data-convert.h"
+
+int bt_convert__perf2json(const char *input_name, const char *to_ctf,
+ struct perf_data_convert_opts *opts);
+
+#endif /* __DATA_CONVERT_JSON_H */
diff --git a/tools/perf/util/data-convert.h b/tools/perf/util/data-convert.h
index feab5f114e37..17c35eb6ab4f 100644
--- a/tools/perf/util/data-convert.h
+++ b/tools/perf/util/data-convert.h
@@ -2,6 +2,8 @@
#ifndef __DATA_CONVERT_H
#define __DATA_CONVERT_H

+#include <stdbool.h>
+
struct perf_data_convert_opts {
bool force;
bool all;
--
2.31.0



2021-03-24 13:33:24

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH] perf data: export to JSON

Em Wed, Mar 24, 2021 at 09:06:50AM -0400, Nicholas Fraser escreveu:
> This adds preliminary support to dump the contents of a perf.data file to
> human-readable JSON.
>
> The "perf data" command currently only supports exporting to Common Trace
> Format and it doesn't do symbol resolution among other things. Dumping to JSON
> means the data can be trivially parsed by anything without any dependencies
> (besides a JSON parser.) We use this to import the data into a tool on Windows
> where integrating perf or libbabeltrace is impractical.
>
> The JSON is encoded using some trivial fprintf() commands; there is no
> dependency on any JSON library. It currently only outputs samples. Other stuff
> like processes and mappings could easily be added as needed. The output is of
> course huge but it compresses well enough.
>
> Use it like this:
>
> perf data convert --to-json out.json

Interesting, see below for some minor stuff while others have the chance
to further review this.

I'm ok with how it is right now, not being that versed into JSON
details.

Do you plan to output the headers too? I think we should, for
completeness.

- Arnaldo

> Here's what the output looks like:
>
> {
> "linux-perf-json-version": 1,
> "samples": [
> {
> "timestamp": 3074717308597,
> "pid": 8604,
> "tid": 8604,
> "comm": "sh",
> "callchain": [
> {
> "ip": "0x7f1e0deb2d36",
> "symbol": "__strcmp_avx2",
> "dso": "libc-2.33.so"
> },
> {
> "ip": "0x7f1e0dd7f49f",
> "symbol": "__gconv_find_transform",
> "dso": "libc-2.33.so"
> },
> {
> "ip": "0x7f1e0de0b71c",
> "symbol": "__wcsmbs_load_conv",
> "dso": "libc-2.33.so"
> }
> ]
> },
> ...
> ]
> }
>
> Signed-off-by: Nicholas Fraser <[email protected]>
> ---
> tools/perf/Documentation/perf-data.txt | 5 +-
> tools/perf/builtin-data.c | 39 ++++-
> tools/perf/util/Build | 1 +
> tools/perf/util/data-convert-json.c | 228 +++++++++++++++++++++++++
> tools/perf/util/data-convert-json.h | 9 +
> tools/perf/util/data-convert.h | 2 +
> 6 files changed, 276 insertions(+), 8 deletions(-)
> create mode 100644 tools/perf/util/data-convert-json.c
> create mode 100644 tools/perf/util/data-convert-json.h
>
> diff --git a/tools/perf/Documentation/perf-data.txt b/tools/perf/Documentation/perf-data.txt
> index 726b9bc9e1a7..417bf17e265c 100644
> --- a/tools/perf/Documentation/perf-data.txt
> +++ b/tools/perf/Documentation/perf-data.txt
> @@ -17,7 +17,7 @@ Data file related processing.
> COMMANDS
> --------
> convert::
> - Converts perf data file into another format (only CTF [1] format is support by now).
> + Converts perf data file into another format.
> It's possible to set data-convert debug variable to get debug messages from conversion,
> like:
> perf --debug data-convert data convert ...
> @@ -27,6 +27,9 @@ OPTIONS for 'convert'
> --to-ctf::
> Triggers the CTF conversion, specify the path of CTF data directory.
>
> +--to-json::
> + Triggers JSON conversion. Specify the JSON filename to output.
> +
> --tod::
> Convert time to wall clock time.
>
> diff --git a/tools/perf/builtin-data.c b/tools/perf/builtin-data.c
> index 8d23b8d6ee8e..64546ba517a5 100644
> --- a/tools/perf/builtin-data.c
> +++ b/tools/perf/builtin-data.c
> @@ -8,6 +8,7 @@
> #include <subcmd/parse-options.h>
> #include "data-convert.h"
> #include "data-convert-bt.h"
> +#include "data-convert-json.h"
>
> typedef int (*data_cmd_fn_t)(int argc, const char **argv);
>
> @@ -55,7 +56,8 @@ static const char * const data_convert_usage[] = {
>
> static int cmd_data_convert(int argc, const char **argv)
> {
> - const char *to_ctf = NULL;
> + const char *to_json = NULL;
> + const char *to_ctf = NULL;
> struct perf_data_convert_opts opts = {
> .force = false,
> .all = false,
> @@ -63,6 +65,7 @@ static int cmd_data_convert(int argc, const char **argv)
> const struct option options[] = {
> OPT_INCR('v', "verbose", &verbose, "be more verbose"),
> OPT_STRING('i', "input", &input_name, "file", "input file name"),
> + OPT_STRING(0, "to-json", &to_json, NULL, "Convert to JSON format"),
> #ifdef HAVE_LIBBABELTRACE_SUPPORT
> OPT_STRING(0, "to-ctf", &to_ctf, NULL, "Convert to CTF format"),
> OPT_BOOLEAN(0, "tod", &opts.tod, "Convert time to wall clock time"),
> @@ -72,11 +75,6 @@ static int cmd_data_convert(int argc, const char **argv)
> OPT_END()
> };
>
> -#ifndef HAVE_LIBBABELTRACE_SUPPORT
> - pr_err("No conversion support compiled in. perf should be compiled with environment variables LIBBABELTRACE=1 and LIBBABELTRACE_DIR=/path/to/libbabeltrace/\n");
> - return -1;
> -#endif
> -
> argc = parse_options(argc, argv, options,
> data_convert_usage, 0);
> if (argc) {
> @@ -84,11 +82,38 @@ static int cmd_data_convert(int argc, const char **argv)
> return -1;
> }
>
> + if (to_json && to_ctf) {
> + pr_err("You cannot specify both --to-ctf and --to-json.\n");
> + return -1;
> + }
> + if (!to_json && !to_ctf) {
> + pr_err("You must specify one of --to-ctf or --to-json.\n");
> + return -1;
> + }
> +
> + if (to_json) {
> + if (opts.all) {
> + pr_err("--all is currently unsupported for JSON output.\n");
> + return -1;
> + }
> + if (opts.tod) {
> + pr_err("--tod is currently unsupported for JSON output.\n");
> + return -1;
> + }
> + if (opts.force) {
> + pr_err("--force is currently unsupported for JSON output.\n");
> + return -1;
> + }
> + return bt_convert__perf2json(input_name, to_json, &opts);
> + }
> +
> if (to_ctf) {
> #ifdef HAVE_LIBBABELTRACE_SUPPORT
> return bt_convert__perf2ctf(input_name, to_ctf, &opts);
> #else
> - pr_err("The libbabeltrace support is not compiled in.\n");
> + pr_err("The libbabeltrace support is not compiled in. perf should be "
> + "compiled with environment variables LIBBABELTRACE=1 and "
> + "LIBBABELTRACE_DIR=/path/to/libbabeltrace/\n");
> return -1;
> #endif
> }
> diff --git a/tools/perf/util/Build b/tools/perf/util/Build
> index e2563d0154eb..de9ac182b25a 100644
> --- a/tools/perf/util/Build
> +++ b/tools/perf/util/Build
> @@ -163,6 +163,7 @@ perf-$(CONFIG_LIBUNWIND_X86) += libunwind/x86_32.o
> perf-$(CONFIG_LIBUNWIND_AARCH64) += libunwind/arm64.o
>
> perf-$(CONFIG_LIBBABELTRACE) += data-convert-bt.o
> +perf-y += data-convert-json.o
>
> perf-y += scripting-engines/
>
> diff --git a/tools/perf/util/data-convert-json.c b/tools/perf/util/data-convert-json.c
> new file mode 100644
> index 000000000000..b19674a9f2b8
> --- /dev/null
> +++ b/tools/perf/util/data-convert-json.c
> @@ -0,0 +1,228 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * JSON export.
> + *
> + * Copyright (C) 2021, CodeWeavers Inc. <[email protected]>
> + */
> +
> +#include "data-convert-json.h"
> +
> +#include <unistd.h>
> +#include <inttypes.h>
> +
> +#include "linux/compiler.h"
> +#include "linux/err.h"
> +#include "util/auxtrace.h"
> +#include "util/debug.h"
> +#include "util/dso.h"
> +#include "util/event.h"
> +#include "util/evsel.h"
> +#include "util/header.h"
> +#include "util/map.h"
> +#include "util/session.h"
> +#include "util/symbol.h"
> +#include "util/thread.h"
> +#include "util/tool.h"
> +
> +struct convert_json {
> + struct perf_tool tool;
> + FILE *out;
> + bool first;
> +};
> +
> +static void output_json_string(FILE *out, const char *s)
> +{
> + fputc('"', out);
> + while (*s) {
> + switch (*s) {
> +
> + // required escapes with special forms as per RFC 8259
> + case '"': fprintf(out, "\\\""); break;
> + case '\\': fprintf(out, "\\\\"); break;
> + case '/': fprintf(out, "\\/"); break;
> + case '\b': fprintf(out, "\\b"); break;
> + case '\f': fprintf(out, "\\f"); break;
> + case '\n': fprintf(out, "\\n"); break;
> + case '\r': fprintf(out, "\\r"); break;
> + case '\t': fprintf(out, "\\t"); break;
> +
> + default:
> + // all other control characters must be escaped by hex code
> + if (*s <= 0x1f) {
> + fprintf(out, "\\u%04x", *s);
> + } else {
> + fputc(*s, out);
> + }
> + break;
> + }
> +
> + ++s;
> + }
> + fputc('"', out);
> +}
> +
> +static void output_sample_callchain_entry(struct perf_tool *tool,
> + u64 ip, struct addr_location *al)
> +{
> + struct convert_json *c = container_of(tool, struct convert_json, tool);
> + FILE *out = c->out;
> +
> + fprintf(out, "\n\t\t\t\t{");
> + fprintf(out, "\n\t\t\t\t\t\"ip\": \"0x%" PRIx64 "\"", ip);
> +
> + if (al && al->sym && al->sym->name && strlen(al->sym->name) > 0) {
> + fprintf(out, ",\n\t\t\t\t\t\"symbol\": ");
> + output_json_string(out, al->sym->name);
> +
> + if (al->map && al->map->dso) {
> + const char *dso = al->map->dso->short_name;
> + if (dso && strlen(dso) > 0) {
> + fprintf(out, ",\n\t\t\t\t\t\"dso\": ");
> + output_json_string(out, dso);
> + }
> + }
> + }
> +
> + fprintf(out, "\n\t\t\t\t}");
> +}
> +
> +static int process_sample_event(struct perf_tool *tool,
> + union perf_event *event __maybe_unused,
> + struct perf_sample *sample,
> + struct evsel *evsel __maybe_unused,
> + struct machine *machine)
> +{
> + struct convert_json *c = container_of(tool, struct convert_json, tool);
> + FILE *out = c->out;
> + struct addr_location al, tal;
> + u8 cpumode = PERF_RECORD_MISC_USER;
> +
> + if (machine__resolve(machine, &al, sample) < 0) {
> + return 0;
> + }
> +
> + if (c->first) {
> + c->first = false;
> + } else {
> + fprintf(out, ",");
> + }
> + fprintf(out, "\n\t\t{");
> +
> + fprintf(out, "\n\t\t\t\"timestamp\": %" PRIi64, sample->time);
> + fprintf(out, ",\n\t\t\t\"pid\": %i", al.thread->pid_);
> + fprintf(out, ",\n\t\t\t\"tid\": %i", al.thread->tid);
> +
> + if (al.thread->cpu >= 0) {
> + fprintf(out, ",\n\t\t\t\"cpu\": %i", al.thread->cpu);
> + }
> +
> + fprintf(out, ",\n\t\t\t\"comm\": ");
> + output_json_string(out, thread__comm_str(al.thread));
> +
> + fprintf(out, ",\n\t\t\t\"callchain\": [");
> + if (sample->callchain) {
> + unsigned int i;
> + bool ok;
> + bool first_callchain = true;
> +
> + for (i = 0; i < sample->callchain->nr; ++i) {
> + u64 ip = sample->callchain->ips[i];
> +
> + if (ip >= PERF_CONTEXT_MAX) {
> + switch (ip) {
> + case PERF_CONTEXT_HV:
> + cpumode = PERF_RECORD_MISC_HYPERVISOR;
> + break;
> + case PERF_CONTEXT_KERNEL:
> + cpumode = PERF_RECORD_MISC_KERNEL;
> + break;
> + case PERF_CONTEXT_USER:
> + cpumode = PERF_RECORD_MISC_USER;
> + break;
> + default:
> + pr_debug("invalid callchain context: "
> + "%"PRId64"\n", (s64) ip);
> + break;
> + }
> + continue;
> + }
> +
> + if (first_callchain) {
> + first_callchain = false;
> + } else {
> + fprintf(out, ",");
> + }
> +
> + ok = thread__find_symbol(al.thread, cpumode, ip, &tal);
> + output_sample_callchain_entry(tool, ip, ok ? &tal : NULL);
> + }
> + } else {
> + output_sample_callchain_entry(tool, sample->ip, &al);
> + }
> + fprintf(out, "\n\t\t\t]");
> +
> + fprintf(out, "\n\t\t}");
> + return 0;
> +}
> +
> +int bt_convert__perf2json(const char *input_name, const char *output_name,
> + struct perf_data_convert_opts *opts __maybe_unused)
> +{
> + struct perf_session *session;
> +
> + struct convert_json c = {
> + .tool = {
> + .sample = process_sample_event,
> + .mmap = perf_event__process_mmap,
> + .mmap2 = perf_event__process_mmap2,
> + .comm = perf_event__process_comm,
> + .namespaces = perf_event__process_namespaces,
> + .cgroup = perf_event__process_cgroup,
> + .exit = perf_event__process_exit,
> + .fork = perf_event__process_fork,
> + .lost = perf_event__process_lost,
> + .tracing_data = perf_event__process_tracing_data,
> + .build_id = perf_event__process_build_id,
> + .id_index = perf_event__process_id_index,
> + .auxtrace_info = perf_event__process_auxtrace_info,
> + .auxtrace = perf_event__process_auxtrace,
> + .event_update = perf_event__process_event_update,
> + .ordered_events = true,
> + .ordering_requires_timestamps = true,
> + },

Please align it as in other tools:

struct convert_json c = {
.tool = {
.sample = process_sample_event,
.mmap = perf_event__process_mmap,
.mmap2 = perf_event__process_mmap2,
.comm = perf_event__process_comm,
.namespaces = perf_event__process_namespaces,
.cgroup = perf_event__process_cgroup,
.exit = perf_event__process_exit,
.fork = perf_event__process_fork,
.lost = perf_event__process_lost,
.tracing_data = perf_event__process_tracing_data,
.build_id = perf_event__process_build_id,
.id_index = perf_event__process_id_index,
.auxtrace_info = perf_event__process_auxtrace_info,
.auxtrace = perf_event__process_auxtrace,
.event_update = perf_event__process_event_update,
.ordered_events = true,
.ordering_requires_timestamps = true,
},

> + .first = true,
> + };
> +
> + struct perf_data data = {
> + .mode = PERF_DATA_MODE_READ,
> + .path = input_name,
> + };
> +
> + c.out = fopen(output_name, "w");
> + if (!c.out) {
> + fprintf(stderr, "error opening output file!\n");
> + return -1;
> + }
> +
> + session = perf_session__new(&data, false, &c.tool);
> + if (IS_ERR(session)) {
> + fprintf(stderr, "error creating perf session!\n");
> + return -1;
> + }
> +
> + if (symbol__init(&session->header.env) < 0) {
> + fprintf(stderr, "symbol init error!\n");
> + return -1;
> + }
> +
> + // Version number for future-proofing. Most additions should be able to be
> + // done in a backwards-compatible way so this should only need to be bumped
> + // if some major breaking change must be made.
> + fprintf(c.out, "{\n\t\"linux-perf-json-version\": 1,");
> +
> + fprintf(c.out, "\n\t\"samples\": [");
> + perf_session__process_events(session);
> + fprintf(c.out, "\n\t]\n}\n");
> +
> + return 0;
> +}
> diff --git a/tools/perf/util/data-convert-json.h b/tools/perf/util/data-convert-json.h
> new file mode 100644
> index 000000000000..1fcac5ce3ec1
> --- /dev/null
> +++ b/tools/perf/util/data-convert-json.h
> @@ -0,0 +1,9 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __DATA_CONVERT_JSON_H
> +#define __DATA_CONVERT_JSON_H
> +#include "data-convert.h"
> +
> +int bt_convert__perf2json(const char *input_name, const char *to_ctf,
> + struct perf_data_convert_opts *opts);
> +
> +#endif /* __DATA_CONVERT_JSON_H */
> diff --git a/tools/perf/util/data-convert.h b/tools/perf/util/data-convert.h
> index feab5f114e37..17c35eb6ab4f 100644
> --- a/tools/perf/util/data-convert.h
> +++ b/tools/perf/util/data-convert.h
> @@ -2,6 +2,8 @@
> #ifndef __DATA_CONVERT_H
> #define __DATA_CONVERT_H
>
> +#include <stdbool.h>
> +
> struct perf_data_convert_opts {
> bool force;
> bool all;
> --
> 2.31.0
>
>

--

- Arnaldo

2021-03-24 18:23:37

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH] perf data: export to JSON

On Wed, Mar 24, 2021 at 09:06:50AM -0400, Nicholas Fraser wrote:
> This adds preliminary support to dump the contents of a perf.data file to
> human-readable JSON.
>
> The "perf data" command currently only supports exporting to Common Trace
> Format and it doesn't do symbol resolution among other things. Dumping to JSON
> means the data can be trivially parsed by anything without any dependencies
> (besides a JSON parser.) We use this to import the data into a tool on Windows
> where integrating perf or libbabeltrace is impractical.

hi,
exciting ;-) and curious, which tool is that?

>
> The JSON is encoded using some trivial fprintf() commands; there is no
> dependency on any JSON library. It currently only outputs samples. Other stuff
> like processes and mappings could easily be added as needed. The output is of
> course huge but it compresses well enough.

we already have zstd support compiled in for compressing samples,
should be easy to use it for compressing the output of this right
away

SNIP

> argc = parse_options(argc, argv, options,
> data_convert_usage, 0);
> if (argc) {
> @@ -84,11 +82,38 @@ static int cmd_data_convert(int argc, const char **argv)
> return -1;
> }
>
> + if (to_json && to_ctf) {
> + pr_err("You cannot specify both --to-ctf and --to-json.\n");
> + return -1;
> + }
> + if (!to_json && !to_ctf) {
> + pr_err("You must specify one of --to-ctf or --to-json.\n");
> + return -1;
> + }
> +

condition below should be under bt_convert__perf2json

> + if (to_json) {
> + if (opts.all) {
> + pr_err("--all is currently unsupported for JSON output.\n");
> + return -1;
> + }
> + if (opts.tod) {
> + pr_err("--tod is currently unsupported for JSON output.\n");
> + return -1;
> + }
> + if (opts.force) {
> + pr_err("--force is currently unsupported for JSON output.\n");
> + return -1;
> + }

I understand not supporting opts.all or opts.tod, but 'force'
support means just assigning 'force' to struct perf_data

> + return bt_convert__perf2json(input_name, to_json, &opts);
> + }
> +
> if (to_ctf) {
> #ifdef HAVE_LIBBABELTRACE_SUPPORT
> return bt_convert__perf2ctf(input_name, to_ctf, &opts);
> #else
> - pr_err("The libbabeltrace support is not compiled in.\n");
> + pr_err("The libbabeltrace support is not compiled in. perf should be "
> + "compiled with environment variables LIBBABELTRACE=1 and "
> + "LIBBABELTRACE_DIR=/path/to/libbabeltrace/\n");

please indent above 2 lines under the "The..." start

SNIP

> +static int process_sample_event(struct perf_tool *tool,
> + union perf_event *event __maybe_unused,
> + struct perf_sample *sample,
> + struct evsel *evsel __maybe_unused,
> + struct machine *machine)
> +{
> + struct convert_json *c = container_of(tool, struct convert_json, tool);
> + FILE *out = c->out;
> + struct addr_location al, tal;
> + u8 cpumode = PERF_RECORD_MISC_USER;
> +
> + if (machine__resolve(machine, &al, sample) < 0) {
> + return 0;

you should fail in here

> + }
> +
> + if (c->first) {
> + c->first = false;
> + } else {
> + fprintf(out, ",");
> + }

no need for curlies {} if there's just one line code under condition

SNIP

> + struct perf_data data = {
> + .mode = PERF_DATA_MODE_READ,
> + .path = input_name,
> + };
> +
> + c.out = fopen(output_name, "w");
> + if (!c.out) {
> + fprintf(stderr, "error opening output file!\n");
> + return -1;
> + }
> +
> + session = perf_session__new(&data, false, &c.tool);
> + if (IS_ERR(session)) {
> + fprintf(stderr, "error creating perf session!\n");
> + return -1;
> + }
> +
> + if (symbol__init(&session->header.env) < 0) {
> + fprintf(stderr, "symbol init error!\n");
> + return -1;
> + }
> +
> + // Version number for future-proofing. Most additions should be able to be
> + // done in a backwards-compatible way so this should only need to be bumped
> + // if some major breaking change must be made.
> + fprintf(c.out, "{\n\t\"linux-perf-json-version\": 1,");
> +
> + fprintf(c.out, "\n\t\"samples\": [");
> + perf_session__process_events(session);
> + fprintf(c.out, "\n\t]\n}\n");

you should close session with perf_session__delete

> +
> + return 0;
> +}
> diff --git a/tools/perf/util/data-convert-json.h b/tools/perf/util/data-convert-json.h
> new file mode 100644
> index 000000000000..1fcac5ce3ec1
> --- /dev/null
> +++ b/tools/perf/util/data-convert-json.h
> @@ -0,0 +1,9 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __DATA_CONVERT_JSON_H
> +#define __DATA_CONVERT_JSON_H
> +#include "data-convert.h"
> +
> +int bt_convert__perf2json(const char *input_name, const char *to_ctf,
> + struct perf_data_convert_opts *opts);
> +
> +#endif /* __DATA_CONVERT_JSON_H */

I don't remember why we added util/data-convert-bt.h, but it does not
make sense to me now.. I think both these declarations should be in
util/data-convert.h

it can be done as follow up on top of this change

thanks,
jirka


> diff --git a/tools/perf/util/data-convert.h b/tools/perf/util/data-convert.h
> index feab5f114e37..17c35eb6ab4f 100644
> --- a/tools/perf/util/data-convert.h
> +++ b/tools/perf/util/data-convert.h
> @@ -2,6 +2,8 @@
> #ifndef __DATA_CONVERT_H
> #define __DATA_CONVERT_H
>
> +#include <stdbool.h>
> +
> struct perf_data_convert_opts {
> bool force;
> bool all;
> --
> 2.31.0
>
>

2021-03-31 10:34:12

by Nicholas Fraser

[permalink] [raw]
Subject: Re: [PATCH] perf data: export to JSON

Hi Arnaldo,

Thanks for the review. I'll send a replacement patch with your suggestions.

On 2021-03-24 9:30 a.m., Arnaldo Carvalho de Melo wrote:
> Do you plan to output the headers too? I think we should, for
> completeness.

I've added the headers, at least the ones that seemed important or easy to
output. The result looks like this:

"headers": {
"header-version": 1,
"captured-on": "2021-03-30T19:24:22Z",
"data-offset": 304,
"data-size": 35000,
"feat-offset": 35304,
"hostname": "foundry",
"os-release": "5.11.8-arch1-1",
"arch": "x86_64",
"cpu-desc": "Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz",
"cpuid": "GenuineIntel,6,142,10",
"nrcpus-online": 8,
"nrcpus-avail": 8,
"perf-version": "5.11.gf40ddce88593",
"cmdline": [
"/usr/bin/perf",
"record",
"vkcube"
]
},

(I've thus far avoided outputting anything we don't use; I'm unlikely to design
a useful format for data if I don't have a real use case for it. We will
probably make use of some of the headers though so it's worth doing now.)

Nick

2021-03-31 10:40:08

by Nicholas Fraser

[permalink] [raw]
Subject: Re: [PATCH] perf data: export to JSON

Hi Jiri,

Thanks for the review. I've addressed your suggestions; some notes are
below. I'll send a new patch.


On 2021-03-24 2:20 p.m., Jiri Olsa wrote:
> On Wed, Mar 24, 2021 at 09:06:50AM -0400, Nicholas Fraser wrote:
>> [...] We use this to import the data into a tool on Windows
>> where integrating perf or libbabeltrace is impractical.
>
> hi,
> exciting ;-) and curious, which tool is that?
>

The tool is called gpuvis. The perf JSON parsing support is here:

https://github.com/ludocode/gpuvis

The idea is to be able to line up samples from perf with GPU trace events, so
you can do things like timebox all perf samples in a particular frame of
rendering.


> we already have zstd support compiled in for compressing samples,
> should be easy to use it for compressing the output of this right
> away

This would require that apps that consume this integrate zstd as well. It's
simpler (both conceptually and from an integration standpoint) to just compress
on command-line if you need with whatever compressor you want. You can even do
this inline by writing to /dev/stdout, e.g.:

perf data convert --to-json /dev/stdout --force | zstd > out.json.zstd

Since we're transferring to Windows, more likely we'd output the JSON and then
put it in a .zip container.


> I understand not supporting opts.all or opts.tod, but 'force'
> support means just assigning 'force' to struct perf_data

It's not clear to me what 'force' does on 'struct perf_data' since we're only
reading it. I assumed for data export it meant the output file should be
overwritten. I've made it do both in the replacement patch.


Nick

2021-04-01 17:44:51

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH] perf data: export to JSON

On Wed, Mar 31, 2021 at 06:38:16AM -0400, Nicholas Fraser wrote:
> Hi Jiri,
>
> Thanks for the review. I've addressed your suggestions; some notes are
> below. I'll send a new patch.
>
>
> On 2021-03-24 2:20 p.m., Jiri Olsa wrote:
> > On Wed, Mar 24, 2021 at 09:06:50AM -0400, Nicholas Fraser wrote:
> >> [...] We use this to import the data into a tool on Windows
> >> where integrating perf or libbabeltrace is impractical.
> >
> > hi,
> > exciting ;-) and curious, which tool is that?
> >
>
> The tool is called gpuvis. The perf JSON parsing support is here:
>
> https://github.com/ludocode/gpuvis
>
> The idea is to be able to line up samples from perf with GPU trace events, so
> you can do things like timebox all perf samples in a particular frame of
> rendering.

I recall you did not add support for walltime clock,
don't you need it to sync with other events?

>
>
> > we already have zstd support compiled in for compressing samples,
> > should be easy to use it for compressing the output of this right
> > away
>
> This would require that apps that consume this integrate zstd as well. It's
> simpler (both conceptually and from an integration standpoint) to just compress
> on command-line if you need with whatever compressor you want. You can even do
> this inline by writing to /dev/stdout, e.g.:
>
> perf data convert --to-json /dev/stdout --force | zstd > out.json.zstd
>
> Since we're transferring to Windows, more likely we'd output the JSON and then
> put it in a .zip container.
>
>

ok

> > I understand not supporting opts.all or opts.tod, but 'force'
> > support means just assigning 'force' to struct perf_data
>
> It's not clear to me what 'force' does on 'struct perf_data' since we're only
> reading it. I assumed for data export it meant the output file should be
> overwritten. I've made it do both in the replacement patch.
>

it tells perf to skip ownership validation, perf will not
open other user data file if it's not forced

jirka