2019-02-03 16:32:23

by Jiri Olsa

[permalink] [raw]
Subject: [RFC/PATCH 00/14] perf record: Add support to store data in directory

hi,
this patchset adds the --dir option to record command (and all
the other record command that overload cmd_record) that allows
the data to be stored in directory with multiple data files.

It's next step for multiple threads implementation in record.
It's now possible to make directory data via --dir option, like:

$ perf record --dir perf bench sched messaging
$ ls -l perf.data
total 344
-rw-------. 1 jolsa jolsa 43864 Jan 20 22:26 data.0
-rw-------. 1 jolsa jolsa 30464 Jan 20 22:26 data.1
-rw-------. 1 jolsa jolsa 53816 Jan 20 22:26 data.2
-rw-------. 1 jolsa jolsa 30368 Jan 20 22:26 data.3
-rw-------. 1 jolsa jolsa 40088 Jan 20 22:26 data.4
-rw-------. 1 jolsa jolsa 42592 Jan 20 22:26 data.5
-rw-------. 1 jolsa jolsa 56136 Jan 20 22:26 data.6
-rw-------. 1 jolsa jolsa 25992 Jan 20 22:26 data.7
-rw-------. 1 jolsa jolsa 8832 Jan 20 22:26 header

There's a data file created for every cpu and it's storing
data for those cpu maps. The report command will read it
transparently, sort it and display as single file data.

It's possible to transform directory data into standard
perf.data file via simple inject command:

$ perf inject -o perf.data.file -i perf.data

The old perf fails over the directory data with following message:
$ perf report
incompatible file format (rerun with -v to learn more)

I'm now testing the record threads support, so I'd like to
have some agreement on the directory data support before.

It's also available in here:
git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
perf/dir

thanks,
jirka


---
Jiri Olsa (14):
perf tools: Make rm_rf to remove single file
perf session: Add process callback to reader object
perf data: Move size to struct perf_data_file
perf data: Add global path holder
perf data: Make check_backup work over directories
perf data: Add perf_data__(create_dir|free_dir) functions
perf data: Add perf_data__open_dir_data function
perf data: Add directory support
perf data: Don't store auxtrace index for directory data file
perf data: Add perf_data__update_dir function
perf data: Make perf_data__size to work over directory
perf session: Add __perf_session__process_dir_events function
perf session: Add path to reader object
perf record: Add --dir option to store data in directory

tools/perf/Documentation/perf-record.txt | 3 ++
tools/perf/builtin-annotate.c | 4 +--
tools/perf/builtin-buildid-cache.c | 4 +--
tools/perf/builtin-buildid-list.c | 8 ++---
tools/perf/builtin-c2c.c | 4 +--
tools/perf/builtin-diff.c | 12 +++----
tools/perf/builtin-evlist.c | 4 +--
tools/perf/builtin-inject.c | 10 +++---
tools/perf/builtin-kmem.c | 2 +-
tools/perf/builtin-kvm.c | 8 ++---
tools/perf/builtin-lock.c | 8 ++---
tools/perf/builtin-mem.c | 8 ++---
tools/perf/builtin-record.c | 72 +++++++++++++++++++++++++++++++------
tools/perf/builtin-report.c | 6 ++--
tools/perf/builtin-sched.c | 16 ++++-----
tools/perf/builtin-script.c | 12 +++----
tools/perf/builtin-stat.c | 6 ++--
tools/perf/builtin-timechart.c | 8 ++---
tools/perf/builtin-trace.c | 8 ++---
tools/perf/util/data-convert-bt.c | 2 +-
tools/perf/util/data.c | 231 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/data.h | 24 +++++++++----
tools/perf/util/mmap.h | 23 ++++++------
tools/perf/util/session.c | 117 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
tools/perf/util/util.c | 10 ++++--
25 files changed, 479 insertions(+), 131 deletions(-)


2019-02-03 16:34:14

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 01/14] perf tools: Make rm_rf to remove single file

Let rm_rf remove file if it's provided by path.

Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/util.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 320b0fef249a..58b8d6a8bfbc 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -125,8 +125,14 @@ int rm_rf(const char *path)
char namebuf[PATH_MAX];

dir = opendir(path);
- if (dir == NULL)
- return 0;
+ if (dir == NULL) {
+ /*
+ * The path does not exist or is not directory,
+ * so there's no harm to try remove it. This way
+ * rm_rf will work over single file.
+ */
+ return unlink(path);
+ }

while ((d = readdir(dir)) != NULL && !ret) {
struct stat statbuf;
--
2.17.2


2019-02-03 16:35:59

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 02/14] perf session: Add process callback to reader object

Adding callback function to reader object so
callers can process data in different ways.

Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/session.c | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 2012396abb7c..b81182b7602a 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1835,10 +1835,17 @@ fetch_mmaped_event(struct perf_session *session,
#define NUM_MMAPS 128
#endif

+struct reader;
+
+typedef s64 (*reader_cb_t)(struct perf_session *session,
+ union perf_event *event,
+ u64 file_offset);
+
struct reader {
- int fd;
- u64 data_size;
- u64 data_offset;
+ int fd;
+ u64 data_size;
+ u64 data_offset;
+ reader_cb_t process;
};

static int
@@ -1909,7 +1916,7 @@ reader__process_events(struct reader *rd, struct perf_session *session,
size = event->header.size;

if (size < sizeof(struct perf_event_header) ||
- (skip = perf_session__process_event(session, event, file_pos)) < 0) {
+ (skip = rd->process(session, event, file_pos)) < 0) {
pr_err("%#" PRIx64 " [%#x]: failed to process type: %d\n",
file_offset + head, event->header.size,
event->header.type);
@@ -1935,12 +1942,20 @@ reader__process_events(struct reader *rd, struct perf_session *session,
return err;
}

+static s64 process_simple(struct perf_session *session,
+ union perf_event *event,
+ u64 file_offset)
+{
+ return perf_session__process_event(session, event, file_offset);
+}
+
static int __perf_session__process_events(struct perf_session *session)
{
struct reader rd = {
.fd = perf_data__fd(session->data),
.data_size = session->header.data_size,
.data_offset = session->header.data_offset,
+ .process = process_simple,
};
struct ordered_events *oe = &session->ordered_events;
struct perf_tool *tool = session->tool;
--
2.17.2


2019-02-03 16:37:19

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 03/14] perf data: Move size to struct perf_data_file

We are about to add support for multiple files,
so we need each file to keep its size.

Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/builtin-record.c | 5 ++---
tools/perf/util/data.c | 2 +-
tools/perf/util/data.h | 4 ++--
3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 6cb587dcb60d..1ed48b488b50 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -659,10 +659,9 @@ static int process_sample_event(struct perf_tool *tool,

static int process_buildids(struct record *rec)
{
- struct perf_data *data = &rec->data;
struct perf_session *session = rec->session;

- if (data->size == 0)
+ if (perf_data__size(&rec->data) == 0)
return 0;

/*
@@ -850,7 +849,7 @@ record__finish_output(struct record *rec)
return;

rec->session->header.data_size += rec->bytes_written;
- data->size = lseek(perf_data__fd(data), 0, SEEK_CUR);
+ data->file.size = lseek(perf_data__fd(data), 0, SEEK_CUR);

if (!rec->no_buildid) {
process_buildids(rec);
diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
index d8cfc19ddb10..09eceda17fc2 100644
--- a/tools/perf/util/data.c
+++ b/tools/perf/util/data.c
@@ -82,7 +82,7 @@ static int open_file_read(struct perf_data *data)
goto out_close;
}

- data->size = st.st_size;
+ data->file.size = st.st_size;
return fd;

out_close:
diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h
index 4828f7feea89..85f9c0dbf982 100644
--- a/tools/perf/util/data.h
+++ b/tools/perf/util/data.h
@@ -12,13 +12,13 @@ enum perf_data_mode {
struct perf_data_file {
const char *path;
int fd;
+ unsigned long size;
};

struct perf_data {
struct perf_data_file file;
bool is_pipe;
bool force;
- unsigned long size;
enum perf_data_mode mode;
};

@@ -44,7 +44,7 @@ static inline int perf_data__fd(struct perf_data *data)

static inline unsigned long perf_data__size(struct perf_data *data)
{
- return data->size;
+ return data->file.size;
}

int perf_data__open(struct perf_data *data);
--
2.17.2


2019-02-03 16:38:37

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 04/14] perf data: Add global path holder

Adding path to the struct perf_data. It will keep the
configured path for the data (const char*). The path
is struct perf_data_file is now dynamically allocated
(duped) from it.

This scheme is useful/used in following patches where
struct perf_data::path holds the directory path.

Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/builtin-annotate.c | 4 +--
tools/perf/builtin-buildid-cache.c | 4 +--
tools/perf/builtin-buildid-list.c | 8 +++---
tools/perf/builtin-c2c.c | 4 +--
tools/perf/builtin-diff.c | 12 ++++-----
tools/perf/builtin-evlist.c | 4 +--
tools/perf/builtin-inject.c | 10 +++-----
tools/perf/builtin-kmem.c | 2 +-
tools/perf/builtin-kvm.c | 8 +++---
tools/perf/builtin-lock.c | 8 +++---
tools/perf/builtin-mem.c | 8 +++---
tools/perf/builtin-record.c | 6 ++---
tools/perf/builtin-report.c | 6 ++---
tools/perf/builtin-sched.c | 16 +++++-------
tools/perf/builtin-script.c | 12 ++++-----
tools/perf/builtin-stat.c | 6 ++---
tools/perf/builtin-timechart.c | 8 +++---
tools/perf/builtin-trace.c | 8 +++---
tools/perf/util/data-convert-bt.c | 2 +-
tools/perf/util/data.c | 39 +++++++++++++++++++++---------
tools/perf/util/data.h | 3 ++-
21 files changed, 86 insertions(+), 92 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 7f3c3fea67b4..67f9d9ffacfb 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -441,7 +441,7 @@ static int __cmd_annotate(struct perf_annotate *ann)
}

if (total_nr_samples == 0) {
- ui__error("The %s file has no samples!\n", session->data->file.path);
+ ui__error("The %s data has no samples!\n", session->data->path);
goto out;
}

@@ -578,7 +578,7 @@ int cmd_annotate(int argc, const char **argv)
if (quiet)
perf_quiet_option();

- data.file.path = input_name;
+ data.path = input_name;

annotate.session = perf_session__new(&data, false, &annotate.tool);
if (annotate.session == NULL)
diff --git a/tools/perf/builtin-buildid-cache.c b/tools/perf/builtin-buildid-cache.c
index 115110a4796a..10457b10e568 100644
--- a/tools/perf/builtin-buildid-cache.c
+++ b/tools/perf/builtin-buildid-cache.c
@@ -416,8 +416,8 @@ int cmd_buildid_cache(int argc, const char **argv)
nsi = nsinfo__new(ns_id);

if (missing_filename) {
- data.file.path = missing_filename;
- data.force = force;
+ data.path = missing_filename;
+ data.force = force;

session = perf_session__new(&data, false, NULL);
if (session == NULL)
diff --git a/tools/perf/builtin-buildid-list.c b/tools/perf/builtin-buildid-list.c
index 78abbe8d9d5f..f403e19488b5 100644
--- a/tools/perf/builtin-buildid-list.c
+++ b/tools/perf/builtin-buildid-list.c
@@ -52,11 +52,9 @@ static int perf_session__list_build_ids(bool force, bool with_hits)
{
struct perf_session *session;
struct perf_data data = {
- .file = {
- .path = input_name,
- },
- .mode = PERF_DATA_MODE_READ,
- .force = force,
+ .path = input_name,
+ .mode = PERF_DATA_MODE_READ,
+ .force = force,
};

symbol__elf_init();
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index b2bf117881f1..cc13b87584b8 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2750,8 +2750,8 @@ static int perf_c2c__report(int argc, const char **argv)
if (!input_name || !strlen(input_name))
input_name = "perf.data";

- data.file.path = input_name;
- data.force = symbol_conf.force;
+ data.path = input_name;
+ data.force = symbol_conf.force;

err = setup_display(display);
if (err)
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 751e1971456b..58fe0e88215c 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -708,7 +708,7 @@ static void data__fprintf(void)

data__for_each_file(i, d)
fprintf(stdout, "# [%d] %s %s\n",
- d->idx, d->data.file.path,
+ d->idx, d->data.path,
!d->idx ? "(Baseline)" : "");

fprintf(stdout, "#\n");
@@ -779,14 +779,14 @@ static int __cmd_diff(void)
data__for_each_file(i, d) {
d->session = perf_session__new(&d->data, false, &tool);
if (!d->session) {
- pr_err("Failed to open %s\n", d->data.file.path);
+ pr_err("Failed to open %s\n", d->data.path);
ret = -1;
goto out_delete;
}

ret = perf_session__process_events(d->session);
if (ret) {
- pr_err("Failed to process %s\n", d->data.file.path);
+ pr_err("Failed to process %s\n", d->data.path);
goto out_delete;
}

@@ -1289,9 +1289,9 @@ static int data_init(int argc, const char **argv)
data__for_each_file(i, d) {
struct perf_data *data = &d->data;

- data->file.path = use_default ? defaults[i] : argv[i];
- data->mode = PERF_DATA_MODE_READ,
- data->force = force,
+ data->path = use_default ? defaults[i] : argv[i];
+ data->mode = PERF_DATA_MODE_READ,
+ data->force = force,

d->idx = i;
}
diff --git a/tools/perf/builtin-evlist.c b/tools/perf/builtin-evlist.c
index e06e822ce634..6e4f63b0da4a 100644
--- a/tools/perf/builtin-evlist.c
+++ b/tools/perf/builtin-evlist.c
@@ -23,9 +23,7 @@ static int __cmd_evlist(const char *file_name, struct perf_attr_details *details
struct perf_session *session;
struct perf_evsel *pos;
struct perf_data data = {
- .file = {
- .path = file_name,
- },
+ .path = file_name,
.mode = PERF_DATA_MODE_READ,
.force = details->force,
};
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 9bb1f35d5cb7..24086b7f1b14 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -770,10 +770,8 @@ int cmd_inject(int argc, const char **argv)
.input_name = "-",
.samples = LIST_HEAD_INIT(inject.samples),
.output = {
- .file = {
- .path = "-",
- },
- .mode = PERF_DATA_MODE_WRITE,
+ .path = "-",
+ .mode = PERF_DATA_MODE_WRITE,
},
};
struct perf_data data = {
@@ -786,7 +784,7 @@ int cmd_inject(int argc, const char **argv)
"Inject build-ids into the output stream"),
OPT_STRING('i', "input", &inject.input_name, "file",
"input file name"),
- OPT_STRING('o', "output", &inject.output.file.path, "file",
+ OPT_STRING('o', "output", &inject.output.path, "file",
"output file name"),
OPT_BOOLEAN('s', "sched-stat", &inject.sched_stat,
"Merge sched-stat and sched-switch for getting events "
@@ -834,7 +832,7 @@ int cmd_inject(int argc, const char **argv)

inject.tool.ordered_events = inject.sched_stat;

- data.file.path = inject.input_name;
+ data.path = inject.input_name;
inject.session = perf_session__new(&data, true, &inject.tool);
if (inject.session == NULL)
return -1;
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index b80ec0883537..fa520f4b8095 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -1949,7 +1949,7 @@ int cmd_kmem(int argc, const char **argv)
return __cmd_record(argc, argv);
}

- data.file.path = input_name;
+ data.path = input_name;

kmem_session = session = perf_session__new(&data, false, &perf_kmem);
if (session == NULL)
diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 3d4cbc4e87c7..dbb6f737a3e2 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -1080,11 +1080,9 @@ static int read_events(struct perf_kvm_stat *kvm)
.ordered_events = true,
};
struct perf_data file = {
- .file = {
- .path = kvm->file_name,
- },
- .mode = PERF_DATA_MODE_READ,
- .force = kvm->force,
+ .path = kvm->file_name,
+ .mode = PERF_DATA_MODE_READ,
+ .force = kvm->force,
};

kvm->tool = eops;
diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index 6e0189df2b3b..b9810a8d350a 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -866,11 +866,9 @@ static int __cmd_report(bool display_info)
.ordered_events = true,
};
struct perf_data data = {
- .file = {
- .path = input_name,
- },
- .mode = PERF_DATA_MODE_READ,
- .force = force,
+ .path = input_name,
+ .mode = PERF_DATA_MODE_READ,
+ .force = force,
};

session = perf_session__new(&data, false, &eops);
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index ba7e8d87dec3..f45c8b502f63 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -239,11 +239,9 @@ static int process_sample_event(struct perf_tool *tool,
static int report_raw_events(struct perf_mem *mem)
{
struct perf_data data = {
- .file = {
- .path = input_name,
- },
- .mode = PERF_DATA_MODE_READ,
- .force = mem->force,
+ .path = input_name,
+ .mode = PERF_DATA_MODE_READ,
+ .force = mem->force,
};
int ret;
struct perf_session *session = perf_session__new(&data, false,
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 1ed48b488b50..fa21efb8bc74 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -917,7 +917,7 @@ record__switch_output(struct record *rec, bool at_exit)

if (!quiet)
fprintf(stderr, "[ perf record: Dump %s.%s ]\n",
- data->file.path, timestamp);
+ data->path, timestamp);

/* Output tracking events */
if (!at_exit) {
@@ -1460,7 +1460,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)

fprintf(stderr, "[ perf record: Captured and wrote %.3f MB %s%s%s ]\n",
perf_data__size(data) / 1024.0 / 1024.0,
- data->file.path, postfix, samples);
+ data->path, postfix, samples);
}

out_delete_session:
@@ -1846,7 +1846,7 @@ static struct option __record_options[] = {
OPT_STRING('C', "cpu", &record.opts.target.cpu_list, "cpu",
"list of cpus to monitor"),
OPT_U64('c', "count", &record.opts.user_interval, "event period to sample"),
- OPT_STRING('o', "output", &record.data.file.path, "file",
+ OPT_STRING('o', "output", &record.data.path, "file",
"output file name"),
OPT_BOOLEAN_SET('i', "no-inherit", &record.opts.no_inherit,
&record.opts.no_inherit_set,
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index a007ea9a3874..15a0de2fe31b 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -882,7 +882,7 @@ static int __cmd_report(struct report *rep)
rep->nr_entries += evsel__hists(pos)->nr_entries;

if (rep->nr_entries == 0) {
- ui__error("The %s file has no samples!\n", data->file.path);
+ ui__error("The %s data has no samples!\n", data->path);
return 0;
}

@@ -1190,8 +1190,8 @@ int cmd_report(int argc, const char **argv)
input_name = "perf.data";
}

- data.file.path = input_name;
- data.force = symbol_conf.force;
+ data.path = input_name;
+ data.force = symbol_conf.force;

repeat:
session = perf_session__new(&data, false, &report.tool);
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 640558e9352e..275f2d92a7bf 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -1785,11 +1785,9 @@ static int perf_sched__read_events(struct perf_sched *sched)
};
struct perf_session *session;
struct perf_data data = {
- .file = {
- .path = input_name,
- },
- .mode = PERF_DATA_MODE_READ,
- .force = sched->force,
+ .path = input_name,
+ .mode = PERF_DATA_MODE_READ,
+ .force = sched->force,
};
int rc = -1;

@@ -2958,11 +2956,9 @@ static int perf_sched__timehist(struct perf_sched *sched)
{ "sched:sched_migrate_task", timehist_migrate_task_event, },
};
struct perf_data data = {
- .file = {
- .path = input_name,
- },
- .mode = PERF_DATA_MODE_READ,
- .force = sched->force,
+ .path = input_name,
+ .mode = PERF_DATA_MODE_READ,
+ .force = sched->force,
};

struct perf_session *session;
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 8d5fe092525c..9101c8812262 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2943,10 +2943,8 @@ int find_scripts(char **scripts_array, char **scripts_path_array)
DIR *scripts_dir, *lang_dir;
struct perf_session *session;
struct perf_data data = {
- .file = {
- .path = input_name,
- },
- .mode = PERF_DATA_MODE_READ,
+ .path = input_name,
+ .mode = PERF_DATA_MODE_READ,
};
char *temp;
int i = 0;
@@ -3419,8 +3417,8 @@ int cmd_script(int argc, const char **argv)
argc = parse_options_subcommand(argc, argv, options, script_subcommands, script_usage,
PARSE_OPT_STOP_AT_NON_OPTION);

- data.file.path = input_name;
- data.force = symbol_conf.force;
+ data.path = input_name;
+ data.force = symbol_conf.force;

if (argc > 1 && !strncmp(argv[0], "rec", strlen("rec"))) {
rec_script_path = get_script_path(argv[1], RECORD_SUFFIX);
@@ -3646,7 +3644,7 @@ int cmd_script(int argc, const char **argv)
goto out_delete;
}

- input = open(data.file.path, O_RDONLY); /* input_name */
+ input = open(data.path, O_RDONLY); /* input_name */
if (input < 0) {
err = -errno;
perror("failed to open file");
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index bb24f9c17f9a..7b8f09b0b8bf 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1322,7 +1322,7 @@ static int __cmd_record(int argc, const char **argv)
PARSE_OPT_STOP_AT_NON_OPTION);

if (output_name)
- data->file.path = output_name;
+ data->path = output_name;

if (stat_config.run_count != 1 || forever) {
pr_err("Cannot use -r option with perf stat record.\n");
@@ -1523,8 +1523,8 @@ static int __cmd_report(int argc, const char **argv)
input_name = "perf.data";
}

- perf_stat.data.file.path = input_name;
- perf_stat.data.mode = PERF_DATA_MODE_READ;
+ perf_stat.data.path = input_name;
+ perf_stat.data.mode = PERF_DATA_MODE_READ;

session = perf_session__new(&perf_stat.data, false, &perf_stat.tool);
if (session == NULL)
diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
index 775b99833e51..9b98687a27b9 100644
--- a/tools/perf/builtin-timechart.c
+++ b/tools/perf/builtin-timechart.c
@@ -1602,11 +1602,9 @@ static int __cmd_timechart(struct timechart *tchart, const char *output_name)
{ "syscalls:sys_exit_select", process_exit_poll },
};
struct perf_data data = {
- .file = {
- .path = input_name,
- },
- .mode = PERF_DATA_MODE_READ,
- .force = tchart->force,
+ .path = input_name,
+ .mode = PERF_DATA_MODE_READ,
+ .force = tchart->force,
};

struct perf_session *session = perf_session__new(&data, false,
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 81a44954d435..cd82142ccdd2 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -3132,11 +3132,9 @@ static int trace__replay(struct trace *trace)
{ "probe:vfs_getname", trace__vfs_getname, },
};
struct perf_data data = {
- .file = {
- .path = input_name,
- },
- .mode = PERF_DATA_MODE_READ,
- .force = trace->force,
+ .path = input_name,
+ .mode = PERF_DATA_MODE_READ,
+ .force = trace->force,
};
struct perf_session *session;
struct perf_evsel *evsel;
diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index 2a36fab76994..ea742a254aa5 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -1650,7 +1650,7 @@ int bt_convert__perf2ctf(const char *input, const char *path,

fprintf(stderr,
"[ perf data convert: Converted '%s' into CTF data '%s' ]\n",
- data.file.path, path);
+ data.path, path);

fprintf(stderr,
"[ perf data convert: Converted and wrote %.3f MB (%" PRIu64 " samples",
diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
index 09eceda17fc2..e16d06ed1100 100644
--- a/tools/perf/util/data.c
+++ b/tools/perf/util/data.c
@@ -19,11 +19,11 @@ static bool check_pipe(struct perf_data *data)
int fd = perf_data__is_read(data) ?
STDIN_FILENO : STDOUT_FILENO;

- if (!data->file.path) {
+ if (!data->path) {
if (!fstat(fd, &st) && S_ISFIFO(st.st_mode))
is_pipe = true;
} else {
- if (!strcmp(data->file.path, "-"))
+ if (!strcmp(data->path, "-"))
is_pipe = true;
}

@@ -37,13 +37,13 @@ static int check_backup(struct perf_data *data)
{
struct stat st;

- if (!stat(data->file.path, &st) && st.st_size) {
+ if (!stat(data->path, &st) && st.st_size) {
/* TODO check errors properly */
char oldname[PATH_MAX];
snprintf(oldname, sizeof(oldname), "%s.old",
- data->file.path);
+ data->path);
unlink(oldname);
- rename(data->file.path, oldname);
+ rename(data->path, oldname);
}

return 0;
@@ -115,8 +115,22 @@ static int open_file(struct perf_data *data)
fd = perf_data__is_read(data) ?
open_file_read(data) : open_file_write(data);

+ if (fd < 0) {
+ free(data->file.path);
+ return -1;
+ }
+
data->file.fd = fd;
- return fd < 0 ? -1 : 0;
+ return 0;
+}
+
+static int open_file_dup(struct perf_data *data)
+{
+ data->file.path = strdup(data->path);
+ if (!data->file.path)
+ return -ENOMEM;
+
+ return open_file(data);
}

int perf_data__open(struct perf_data *data)
@@ -124,14 +138,15 @@ int perf_data__open(struct perf_data *data)
if (check_pipe(data))
return 0;

- if (!data->file.path)
- data->file.path = "perf.data";
+ if (!data->path)
+ data->path = "perf.data";

- return open_file(data);
+ return open_file_dup(data);
}

void perf_data__close(struct perf_data *data)
{
+ free(data->file.path);
close(data->file.fd);
}

@@ -159,15 +174,15 @@ int perf_data__switch(struct perf_data *data,
if (perf_data__is_read(data))
return -EINVAL;

- if (asprintf(&new_filepath, "%s.%s", data->file.path, postfix) < 0)
+ if (asprintf(&new_filepath, "%s.%s", data->path, postfix) < 0)
return -ENOMEM;

/*
* Only fire a warning, don't return error, continue fill
* original file.
*/
- if (rename(data->file.path, new_filepath))
- pr_warning("Failed to rename %s to %s\n", data->file.path, new_filepath);
+ if (rename(data->path, new_filepath))
+ pr_warning("Failed to rename %s to %s\n", data->path, new_filepath);

if (!at_exit) {
close(data->file.fd);
diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h
index 85f9c0dbf982..2bce28117ccf 100644
--- a/tools/perf/util/data.h
+++ b/tools/perf/util/data.h
@@ -10,12 +10,13 @@ enum perf_data_mode {
};

struct perf_data_file {
- const char *path;
+ char *path;
int fd;
unsigned long size;
};

struct perf_data {
+ const char *path;
struct perf_data_file file;
bool is_pipe;
bool force;
--
2.17.2


2019-02-03 16:42:19

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 05/14] perf data: Make check_backup work over directories

Changing check_backup to work also over directory
path. Also moving the call earlier in the path,
so it can rename also directory data.

Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/data.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
index e16d06ed1100..0a3051cc0ea0 100644
--- a/tools/perf/util/data.c
+++ b/tools/perf/util/data.c
@@ -37,12 +37,15 @@ static int check_backup(struct perf_data *data)
{
struct stat st;

+ if (perf_data__is_read(data))
+ return 0;
+
if (!stat(data->path, &st) && st.st_size) {
/* TODO check errors properly */
char oldname[PATH_MAX];
snprintf(oldname, sizeof(oldname), "%s.old",
data->path);
- unlink(oldname);
+ rm_rf(oldname);
rename(data->path, oldname);
}

@@ -95,9 +98,6 @@ static int open_file_write(struct perf_data *data)
int fd;
char sbuf[STRERR_BUFSIZE];

- if (check_backup(data))
- return -1;
-
fd = open(data->file.path, O_CREAT|O_RDWR|O_TRUNC|O_CLOEXEC,
S_IRUSR|S_IWUSR);

@@ -141,6 +141,9 @@ int perf_data__open(struct perf_data *data)
if (!data->path)
data->path = "perf.data";

+ if (check_backup(data))
+ return -1;
+
return open_file_dup(data);
}

--
2.17.2


2019-02-03 16:42:39

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 06/14] perf data: Add perf_data__(create_dir|free_dir) functions

Adding perf_data__create_dir to create nr files inside
struct perf_data path directory:
int perf_data__create_dir(struct perf_data *data, int nr);

and function to free that data:
void perf_data__free_dir(struct perf_data *data);

Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/data.c | 47 ++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/data.h | 8 +++++++
2 files changed, 55 insertions(+)

diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
index 0a3051cc0ea0..ff1d9e5bd68d 100644
--- a/tools/perf/util/data.c
+++ b/tools/perf/util/data.c
@@ -7,11 +7,58 @@
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
+#include <asm/bug.h>

#include "data.h"
#include "util.h"
#include "debug.h"

+static void free_dir(struct perf_data_file *files, int nr)
+{
+ while (--nr >= 1) {
+ close(files[nr].fd);
+ free(files[nr].path);
+ }
+ free(files);
+}
+
+void perf_data__free_dir(struct perf_data *data)
+{
+ free_dir(data->dir.files, data->dir.nr);
+}
+
+int perf_data__create_dir(struct perf_data *data, int nr)
+{
+ struct perf_data_file *files = NULL;
+ int i, ret = -1;
+
+ files = malloc(nr * sizeof(*files));
+ if (!files)
+ return -ENOMEM;
+
+ data->dir.files = files;
+ data->dir.nr = nr;
+
+ for (i = 0; i < nr; i++) {
+ struct perf_data_file *file = &files[i];
+
+ if (asprintf(&file->path, "%s/data.%d", data->path, i) < 0)
+ goto out_err;
+
+ ret = open(file->path, O_RDWR|O_CREAT|O_TRUNC, S_IRUSR|S_IWUSR);
+ if (ret < 0)
+ goto out_err;
+
+ file->fd = ret;
+ }
+
+ return 0;
+
+out_err:
+ free_dir(files, i);
+ return ret;
+}
+
static bool check_pipe(struct perf_data *data)
{
struct stat st;
diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h
index 2bce28117ccf..3b4115dc777f 100644
--- a/tools/perf/util/data.h
+++ b/tools/perf/util/data.h
@@ -21,6 +21,11 @@ struct perf_data {
bool is_pipe;
bool force;
enum perf_data_mode mode;
+
+ struct {
+ struct perf_data_file *files;
+ int nr;
+ } dir;
};

static inline bool perf_data__is_read(struct perf_data *data)
@@ -64,4 +69,7 @@ ssize_t perf_data_file__write(struct perf_data_file *file,
int perf_data__switch(struct perf_data *data,
const char *postfix,
size_t pos, bool at_exit);
+
+int perf_data__create_dir(struct perf_data *data, int nr);
+void perf_data__free_dir(struct perf_data *data);
#endif /* __PERF_DATA_H */
--
2.17.2


2019-02-03 16:43:17

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 08/14] perf data: Add directory support

Adding support to have directory as perf.data.

The caller needs to set 'struct perf_data::is_dir flag
and the path will be treated as directory.

The 'struct perf_data::file' is initialized and open
as 'path/header' file.

Adding check to direcory interface functions to check
on is_dir flag.

Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/data.c | 39 +++++++++++++++++++++++++++++++++++++--
tools/perf/util/data.h | 6 ++++++
2 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
index 7c37b67fbc3c..59c052454fac 100644
--- a/tools/perf/util/data.c
+++ b/tools/perf/util/data.c
@@ -34,6 +34,9 @@ int perf_data__create_dir(struct perf_data *data, int nr)
struct perf_data_file *files = NULL;
int i, ret = -1;

+ if (WARN_ON(!data->is_dir))
+ return -EINVAL;
+
files = malloc(nr * sizeof(*files));
if (!files)
return -ENOMEM;
@@ -61,7 +64,6 @@ int perf_data__create_dir(struct perf_data *data, int nr)
return ret;
}

-__maybe_unused
static int perf_data__open_dir(struct perf_data *data)
{
struct perf_data_file *files = NULL;
@@ -70,6 +72,9 @@ static int perf_data__open_dir(struct perf_data *data)
DIR *dir;
int nr = 0;

+ if (WARN_ON(!data->is_dir))
+ return -EINVAL;
+
dir = opendir(data->path);
if (!dir)
return -EINVAL;
@@ -159,6 +164,16 @@ static int check_backup(struct perf_data *data)
return 0;
}

+static bool is_dir(struct perf_data *data)
+{
+ struct stat st;
+
+ if (stat(data->path, &st))
+ return false;
+
+ return (st.st_mode & S_IFMT) == S_IFDIR;
+}
+
static int open_file_read(struct perf_data *data)
{
struct stat st;
@@ -240,6 +255,22 @@ static int open_file_dup(struct perf_data *data)
return open_file(data);
}

+static int open_dir(struct perf_data *data)
+{
+ if (perf_data__is_write(data)) {
+ if (mkdir(data->path, S_IRWXU) < 0)
+ return -1;
+ } else {
+ if (perf_data__open_dir(data))
+ return -1;
+ }
+
+ if (asprintf(&data->file.path, "%s/header", data->path) < 0)
+ return -ENOMEM;
+
+ return open_file(data);
+}
+
int perf_data__open(struct perf_data *data)
{
if (check_pipe(data))
@@ -251,7 +282,11 @@ int perf_data__open(struct perf_data *data)
if (check_backup(data))
return -1;

- return open_file_dup(data);
+ if (perf_data__is_read(data))
+ data->is_dir = is_dir(data);
+
+ return perf_data__is_dir(data) ?
+ open_dir(data) : open_file_dup(data);
}

void perf_data__close(struct perf_data *data)
diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h
index 3b4115dc777f..a1edf7e03a71 100644
--- a/tools/perf/util/data.h
+++ b/tools/perf/util/data.h
@@ -19,6 +19,7 @@ struct perf_data {
const char *path;
struct perf_data_file file;
bool is_pipe;
+ bool is_dir;
bool force;
enum perf_data_mode mode;

@@ -43,6 +44,11 @@ static inline int perf_data__is_pipe(struct perf_data *data)
return data->is_pipe;
}

+static inline bool perf_data__is_dir(struct perf_data *data)
+{
+ return data->is_dir;
+}
+
static inline int perf_data__fd(struct perf_data *data)
{
return data->file.fd;
--
2.17.2


2019-02-03 16:44:52

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 10/14] perf data: Add perf_data__update_dir function

Adding perf_data__update_dir function to update
size for every file within the perf.data directory.

Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/data.c | 20 ++++++++++++++++++++
tools/perf/util/data.h | 1 +
2 files changed, 21 insertions(+)

diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
index 59c052454fac..fbe46dab8dbd 100644
--- a/tools/perf/util/data.c
+++ b/tools/perf/util/data.c
@@ -124,6 +124,26 @@ static int perf_data__open_dir(struct perf_data *data)
return ret;
}

+int perf_data__update_dir(struct perf_data *data)
+{
+ int i;
+
+ if (WARN_ON(!data->is_dir))
+ return -EINVAL;
+
+ for (i = 0; i < data->dir.nr; i++) {
+ struct perf_data_file *file = &data->dir.files[i];
+ struct stat st;
+
+ if (fstat(file->fd, &st))
+ return -1;
+
+ file->size = st.st_size;
+ }
+
+ return 0;
+}
+
static bool check_pipe(struct perf_data *data)
{
struct stat st;
diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h
index a1edf7e03a71..bd7affb69dba 100644
--- a/tools/perf/util/data.h
+++ b/tools/perf/util/data.h
@@ -78,4 +78,5 @@ int perf_data__switch(struct perf_data *data,

int perf_data__create_dir(struct perf_data *data, int nr);
void perf_data__free_dir(struct perf_data *data);
+int perf_data__update_dir(struct perf_data *data);
#endif /* __PERF_DATA_H */
--
2.17.2


2019-02-03 16:45:06

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 09/14] perf data: Don't store auxtrace index for directory data file

We can't store auxtrace index when we store to multiple files,
because we keep only offset for it, not the file.

The auxtrace data will be processed correctly in the 'pipe' mode.

Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/builtin-record.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index fa21efb8bc74..cd02ab3ec4ff 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -391,7 +391,7 @@ static int record__process_auxtrace(struct perf_tool *tool,
size_t padding;
u8 pad[8] = {0};

- if (!perf_data__is_pipe(data)) {
+ if (!perf_data__is_pipe(data) && !perf_data__is_dir(data)) {
off_t file_offset;
int fd = perf_data__fd(data);
int err;
--
2.17.2


2019-02-03 16:47:31

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 11/14] perf data: Make perf_data__size to work over directory

Making perf_data__size to return proper size
for directory data.

Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/data.c | 17 +++++++++++++++++
tools/perf/util/data.h | 6 +-----
2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
index fbe46dab8dbd..a36583bced1c 100644
--- a/tools/perf/util/data.c
+++ b/tools/perf/util/data.c
@@ -367,3 +367,20 @@ int perf_data__switch(struct perf_data *data,
free(new_filepath);
return ret;
}
+
+unsigned long perf_data__size(struct perf_data *data)
+{
+ u64 size = data->file.size;
+ int i;
+
+ if (!data->is_dir)
+ return size;
+
+ for (i = 0; i < data->dir.nr; i++) {
+ struct perf_data_file *file = &data->dir.files[i];
+
+ size += file->size;
+ }
+
+ return size;
+}
diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h
index bd7affb69dba..c1b35e7352df 100644
--- a/tools/perf/util/data.h
+++ b/tools/perf/util/data.h
@@ -54,11 +54,6 @@ static inline int perf_data__fd(struct perf_data *data)
return data->file.fd;
}

-static inline unsigned long perf_data__size(struct perf_data *data)
-{
- return data->file.size;
-}
-
int perf_data__open(struct perf_data *data);
void perf_data__close(struct perf_data *data);
ssize_t perf_data__write(struct perf_data *data,
@@ -79,4 +74,5 @@ int perf_data__switch(struct perf_data *data,
int perf_data__create_dir(struct perf_data *data, int nr);
void perf_data__free_dir(struct perf_data *data);
int perf_data__update_dir(struct perf_data *data);
+unsigned long perf_data__size(struct perf_data *data);
#endif /* __PERF_DATA_H */
--
2.17.2


2019-02-03 16:48:13

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 13/14] perf session: Add path to reader object

Adding path to reader object, so we can display file
the processing fails for (string in [] brackets).

$ perf report --stdio
0x5e0 [perf.data/data.3] [0xa200]: failed to process type: -1577027574

Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/session.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 38d2854cebc1..aa7c4fd1cd2f 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1846,6 +1846,7 @@ struct reader {
u64 data_size;
u64 data_offset;
reader_cb_t process;
+ char *path;
};

static int
@@ -1915,8 +1916,8 @@ reader__process_events(struct reader *rd, struct perf_session *session,

if (size < sizeof(struct perf_event_header) ||
(skip = rd->process(session, event, file_pos)) < 0) {
- pr_err("%#" PRIx64 " [%#x]: failed to process type: %d\n",
- file_offset + head, event->header.size,
+ pr_err("%#" PRIx64 " [%s] [%#x]: failed to process type: %d\n",
+ file_offset + head, rd->path, event->header.size,
event->header.type);
err = -EINVAL;
goto out;
@@ -1954,6 +1955,7 @@ static int __perf_session__process_events(struct perf_session *session)
.data_size = session->header.data_size,
.data_offset = session->header.data_offset,
.process = process_simple,
+ .path = session->data->file.path,
};
struct ordered_events *oe = &session->ordered_events;
struct perf_tool *tool = session->tool;
@@ -2028,6 +2030,7 @@ static int __perf_session__process_dir_events(struct perf_session *session)
.data_size = session->header.data_size,
.data_offset = session->header.data_offset,
.process = process_simple,
+ .path = session->data->file.path,
};
int i, ret = 0;
struct ui_progress prog;
@@ -2052,6 +2055,7 @@ static int __perf_session__process_dir_events(struct perf_session *session)
.data_size = file->size,
.data_offset = 0,
.process = process_index,
+ .path = file->path,
};

ret = reader__process_events(&rd, session, &prog);
--
2.17.2


2019-02-03 16:48:33

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 14/14] perf record: Add --dir option to store data in directory

Adding --dir option to store data in directory. It's next
step for multiple threads in record. It's not possible
to make directory data via --dir option, like:

$ perf record --dir perf bench sched messaging
$ ls -l perf.data
total 344
-rw-------. 1 jolsa jolsa 43864 Jan 20 22:26 data.0
-rw-------. 1 jolsa jolsa 30464 Jan 20 22:26 data.1
-rw-------. 1 jolsa jolsa 53816 Jan 20 22:26 data.2
-rw-------. 1 jolsa jolsa 30368 Jan 20 22:26 data.3
-rw-------. 1 jolsa jolsa 40088 Jan 20 22:26 data.4
-rw-------. 1 jolsa jolsa 42592 Jan 20 22:26 data.5
-rw-------. 1 jolsa jolsa 56136 Jan 20 22:26 data.6
-rw-------. 1 jolsa jolsa 25992 Jan 20 22:26 data.7
-rw-------. 1 jolsa jolsa 8832 Jan 20 22:26 header

There's a data file created for every cpu and it's storing
data for those cpu maps.

It's possible to transform directory data into standard
perf.data file via following inject command:

$ perf inject -o perf.data.file -i perf.data

Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/Documentation/perf-record.txt | 3 ++
tools/perf/builtin-record.c | 59 ++++++++++++++++++++++--
tools/perf/util/mmap.h | 23 ++++-----
3 files changed, 70 insertions(+), 15 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index d232b13ea713..8dcdc8cabcad 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -505,6 +505,9 @@ config terms. For example: 'cycles/overwrite/' and 'instructions/no-overwrite/'.

Implies --tail-synthesize.

+--dir::
+Store data into directory with one data file for cpu.
+
SEE ALSO
--------
linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index cd02ab3ec4ff..87e39b9cc7bd 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -111,17 +111,21 @@ static bool switch_output_time(struct record *rec)
trigger_is_ready(&switch_output_trigger);
}

-static int record__write(struct record *rec, struct perf_mmap *map __maybe_unused,
+static int record__write(struct record *rec, struct perf_mmap *map,
void *bf, size_t size)
{
- struct perf_data_file *file = &rec->session->data->file;
+ struct perf_data_file *file = &rec->data.file;
+
+ if (map && map->file)
+ file = map->file;

if (perf_data_file__write(file, bf, size) < 0) {
pr_err("failed to write perf data, error: %m\n");
return -1;
}

- rec->bytes_written += size;
+ if (file == &rec->data.file)
+ rec->bytes_written += size;

if (switch_output_size(rec))
trigger_hit(&switch_output_trigger);
@@ -563,6 +567,25 @@ static int record__mmap_evlist(struct record *rec,
return 0;
}

+static int record__mmap_dir_data(struct record *rec)
+{
+ struct perf_evlist *evlist = rec->evlist;
+ struct perf_data *data = &rec->data;
+ int i, ret, nr = evlist->nr_mmaps;
+
+ ret = perf_data__create_dir(data, nr);
+ if (ret)
+ return ret;
+
+ for (i = 0; i < nr; i++) {
+ struct perf_mmap *map = &evlist->mmap[i];
+
+ map->file = &data->dir.files[i];
+ }
+
+ return 0;
+}
+
static int record__mmap(struct record *rec)
{
return record__mmap_evlist(rec, rec->evlist);
@@ -792,8 +815,12 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
/*
* Mark the round finished in case we wrote
* at least one event.
+ *
+ * No need for round events in directory mode,
+ * because per-cpu files/maps have sorted data
+ * from kernel.
*/
- if (bytes_written != rec->bytes_written)
+ if (!perf_data__is_dir(&rec->data) && bytes_written != rec->bytes_written)
rc = record__write(rec, NULL, &finished_round_event, sizeof(finished_round_event));

if (overwrite)
@@ -851,6 +878,9 @@ record__finish_output(struct record *rec)
rec->session->header.data_size += rec->bytes_written;
data->file.size = lseek(perf_data__fd(data), 0, SEEK_CUR);

+ if (perf_data__is_dir(data))
+ perf_data__update_dir(data);
+
if (!rec->no_buildid) {
process_buildids(rec);

@@ -1170,11 +1200,23 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
if (data->is_pipe && rec->evlist->nr_entries == 1)
rec->opts.sample_id = true;

+ if (data->is_pipe && perf_data__is_dir(data)) {
+ pr_err("Directory output is not allowed for pipe output\n");
+ err = -1;
+ goto out_child;
+ }
+
if (record__open(rec) != 0) {
err = -1;
goto out_child;
}

+ if (perf_data__is_dir(data)) {
+ err = record__mmap_dir_data(rec);
+ if (err)
+ goto out_child;
+ }
+
err = bpf__apply_obj_config();
if (err) {
char errbuf[BUFSIZ];
@@ -1962,6 +2004,8 @@ static struct option __record_options[] = {
&nr_cblocks_default, "n", "Use <n> control blocks in asynchronous trace writing mode (default: 1, max: 4)",
record__aio_parse),
#endif
+ OPT_BOOLEAN(0, "dir", &record.data.is_dir,
+ "Store data into directory perf.data"),
OPT_END()
};

@@ -2113,6 +2157,13 @@ int cmd_record(int argc, const char **argv)
goto out;
}

+ if (perf_data__is_dir(&rec->data)) {
+ if (!rec->opts.sample_time) {
+ pr_err("Sample timestamp is required for indexing\n");
+ goto out;
+ }
+ }
+
if (rec->opts.target.tid && !rec->opts.no_inherit_set)
rec->opts.no_inherit = true;

diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index e566c19b242b..3e8595a8d6ce 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -19,17 +19,18 @@ struct aiocb;
* @refcnt - e.g. code using PERF_EVENT_IOC_SET_OUTPUT to share this
*/
struct perf_mmap {
- void *base;
- int mask;
- int fd;
- int cpu;
- refcount_t refcnt;
- u64 prev;
- u64 start;
- u64 end;
- bool overwrite;
- struct auxtrace_mmap auxtrace_mmap;
- char event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
+ void *base;
+ int mask;
+ int fd;
+ int cpu;
+ refcount_t refcnt;
+ u64 prev;
+ u64 start;
+ u64 end;
+ bool overwrite;
+ struct auxtrace_mmap auxtrace_mmap;
+ struct perf_data_file *file;
+ char event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
#ifdef HAVE_AIO_SUPPORT
struct {
void **data;
--
2.17.2


2019-02-03 16:48:47

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 12/14] perf session: Add __perf_session__process_dir_events function

Adding __perf_session__process_dir_events function
to process events over the directory data.

All directory events are pushed into sessions ordered
data and flushed for processing.

Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/session.c | 86 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 84 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index b81182b7602a..38d2854cebc1 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1864,8 +1864,6 @@ reader__process_events(struct reader *rd, struct perf_session *session,
file_offset = page_offset;
head = rd->data_offset - page_offset;

- ui_progress__init_size(prog, data_size, "Processing events...");
-
data_size += rd->data_offset;

mmap_size = MMAP_SIZE;
@@ -1994,6 +1992,87 @@ static int __perf_session__process_events(struct perf_session *session)
return err;
}

+static s64 process_index(struct perf_session *session,
+ union perf_event *event,
+ u64 file_offset)
+{
+ struct perf_evlist *evlist = session->evlist;
+ u64 timestamp;
+ s64 ret;
+
+ if (session->header.needs_swap)
+ event_swap(event, perf_evlist__sample_id_all(evlist));
+
+ if (event->header.type >= PERF_RECORD_HEADER_MAX)
+ return -EINVAL;
+
+ events_stats__inc(&evlist->stats, event->header.type);
+
+ if (event->header.type >= PERF_RECORD_USER_TYPE_START)
+ return perf_session__process_user_event(session, event, file_offset);
+
+ ret = perf_evlist__parse_sample_timestamp(evlist, event, &timestamp);
+ if (ret)
+ return ret;
+
+ return ordered_events__queue(&session->ordered_events, event,
+ timestamp, file_offset);
+}
+
+static int __perf_session__process_dir_events(struct perf_session *session)
+{
+ struct perf_data *data = session->data;
+ struct perf_tool *tool = session->tool;
+ struct reader rd = {
+ .fd = perf_data__fd(session->data),
+ .data_size = session->header.data_size,
+ .data_offset = session->header.data_offset,
+ .process = process_simple,
+ };
+ int i, ret = 0;
+ struct ui_progress prog;
+ u64 total_size = perf_data__size(session->data);
+
+ perf_tool__fill_defaults(tool);
+
+ ui_progress__init_size(&prog, total_size, "Processing events...");
+
+ ret = reader__process_events(&rd, session, &prog);
+ if (ret)
+ goto out_err;
+
+ for (i = 0; i < data->dir.nr ; i++) {
+ struct perf_data_file *file = &data->dir.files[i];
+
+ if (file->size == 0)
+ continue;
+
+ rd = (struct reader) {
+ .fd = file->fd,
+ .data_size = file->size,
+ .data_offset = 0,
+ .process = process_index,
+ };
+
+ ret = reader__process_events(&rd, session, &prog);
+ if (ret)
+ goto out_err;
+ }
+
+ ret = ordered_events__flush(&session->ordered_events, OE_FLUSH__FINAL);
+
+out_err:
+ if (!tool->no_warn)
+ perf_session__warn_about_errors(session);
+
+ /*
+ * We may switching perf.data output, make ordered_events
+ * reusable.
+ */
+ ordered_events__reinit(&session->ordered_events);
+ return ret;
+}
+
int perf_session__process_events(struct perf_session *session)
{
if (perf_session__register_idle_thread(session) < 0)
@@ -2002,6 +2081,9 @@ int perf_session__process_events(struct perf_session *session)
if (perf_data__is_pipe(session->data))
return __perf_session__process_pipe_events(session);

+ if (perf_data__is_dir(session->data))
+ return __perf_session__process_dir_events(session);
+
return __perf_session__process_events(session);
}

--
2.17.2


2019-02-03 16:50:22

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 07/14] perf data: Add perf_data__open_dir_data function

Adding perf_data__open_dir_data to open files inside
struct perf_data path directory:
static int perf_data__open_dir(struct perf_data *data);

Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/data.c | 60 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 60 insertions(+)

diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
index ff1d9e5bd68d..7c37b67fbc3c 100644
--- a/tools/perf/util/data.c
+++ b/tools/perf/util/data.c
@@ -8,6 +8,8 @@
#include <unistd.h>
#include <string.h>
#include <asm/bug.h>
+#include <sys/types.h>
+#include <dirent.h>

#include "data.h"
#include "util.h"
@@ -59,6 +61,64 @@ int perf_data__create_dir(struct perf_data *data, int nr)
return ret;
}

+__maybe_unused
+static int perf_data__open_dir(struct perf_data *data)
+{
+ struct perf_data_file *files = NULL;
+ struct dirent *dent;
+ int ret = -1;
+ DIR *dir;
+ int nr = 0;
+
+ dir = opendir(data->path);
+ if (!dir)
+ return -EINVAL;
+
+ while ((dent = readdir(dir)) != NULL) {
+ struct perf_data_file *file;
+ char path[PATH_MAX];
+ struct stat st;
+
+ snprintf(path, sizeof(path), "%s/%s", data->path, dent->d_name);
+ if (stat(path, &st))
+ continue;
+
+ if (!S_ISREG(st.st_mode) || strncmp(dent->d_name, "data", 4))
+ continue;
+
+ ret = -ENOMEM;
+
+ file = realloc(files, (nr + 1) * sizeof(*files));
+ if (!file)
+ goto out_err;
+
+ files = file;
+ file = &files[nr++];
+
+ file->path = strdup(path);
+ if (!file->path)
+ goto out_err;
+
+ ret = open(file->path, O_RDONLY);
+ if (ret < 0)
+ goto out_err;
+
+ file->fd = ret;
+ file->size = st.st_size;
+ }
+
+ if (!files)
+ return -EINVAL;
+
+ data->dir.files = files;
+ data->dir.nr = nr;
+ return 0;
+
+out_err:
+ free_dir(files, nr);
+ return ret;
+}
+
static bool check_pipe(struct perf_data *data)
{
struct stat st;
--
2.17.2


2019-02-04 10:14:00

by Alexey Budankov

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory


Hi,

On 03.02.2019 18:30, Jiri Olsa wrote:
> hi,
> this patchset adds the --dir option to record command (and all
> the other record command that overload cmd_record) that allows
> the data to be stored in directory with multiple data files.
>
> It's next step for multiple threads implementation in record.
> It's now possible to make directory data via --dir option, like:
>
> $ perf record --dir perf bench sched messaging

Is it possible to name data directory differently from perf.data
e.g. using --output option, like this?

$ perf record --output result_1 --dir perf bench sched messaging

Thanks,
Alexey

> $ ls -l perf.data
> total 344
> -rw-------. 1 jolsa jolsa 43864 Jan 20 22:26 data.0
> -rw-------. 1 jolsa jolsa 30464 Jan 20 22:26 data.1
> -rw-------. 1 jolsa jolsa 53816 Jan 20 22:26 data.2
> -rw-------. 1 jolsa jolsa 30368 Jan 20 22:26 data.3
> -rw-------. 1 jolsa jolsa 40088 Jan 20 22:26 data.4
> -rw-------. 1 jolsa jolsa 42592 Jan 20 22:26 data.5
> -rw-------. 1 jolsa jolsa 56136 Jan 20 22:26 data.6
> -rw-------. 1 jolsa jolsa 25992 Jan 20 22:26 data.7
> -rw-------. 1 jolsa jolsa 8832 Jan 20 22:26 header
>
> There's a data file created for every cpu and it's storing
> data for those cpu maps. The report command will read it
> transparently, sort it and display as single file data.
>
> It's possible to transform directory data into standard
> perf.data file via simple inject command:
>
> $ perf inject -o perf.data.file -i perf.data
>
> The old perf fails over the directory data with following message:
> $ perf report
> incompatible file format (rerun with -v to learn more)
>
> I'm now testing the record threads support, so I'd like to
> have some agreement on the directory data support before.
>
> It's also available in here:
> git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
> perf/dir
>
> thanks,
> jirka
>
>
> ---
> Jiri Olsa (14):
> perf tools: Make rm_rf to remove single file
> perf session: Add process callback to reader object
> perf data: Move size to struct perf_data_file
> perf data: Add global path holder
> perf data: Make check_backup work over directories
> perf data: Add perf_data__(create_dir|free_dir) functions
> perf data: Add perf_data__open_dir_data function
> perf data: Add directory support
> perf data: Don't store auxtrace index for directory data file
> perf data: Add perf_data__update_dir function
> perf data: Make perf_data__size to work over directory
> perf session: Add __perf_session__process_dir_events function
> perf session: Add path to reader object
> perf record: Add --dir option to store data in directory
>
> tools/perf/Documentation/perf-record.txt | 3 ++
> tools/perf/builtin-annotate.c | 4 +--
> tools/perf/builtin-buildid-cache.c | 4 +--
> tools/perf/builtin-buildid-list.c | 8 ++---
> tools/perf/builtin-c2c.c | 4 +--
> tools/perf/builtin-diff.c | 12 +++----
> tools/perf/builtin-evlist.c | 4 +--
> tools/perf/builtin-inject.c | 10 +++---
> tools/perf/builtin-kmem.c | 2 +-
> tools/perf/builtin-kvm.c | 8 ++---
> tools/perf/builtin-lock.c | 8 ++---
> tools/perf/builtin-mem.c | 8 ++---
> tools/perf/builtin-record.c | 72 +++++++++++++++++++++++++++++++------
> tools/perf/builtin-report.c | 6 ++--
> tools/perf/builtin-sched.c | 16 ++++-----
> tools/perf/builtin-script.c | 12 +++----
> tools/perf/builtin-stat.c | 6 ++--
> tools/perf/builtin-timechart.c | 8 ++---
> tools/perf/builtin-trace.c | 8 ++---
> tools/perf/util/data-convert-bt.c | 2 +-
> tools/perf/util/data.c | 231 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> tools/perf/util/data.h | 24 +++++++++----
> tools/perf/util/mmap.h | 23 ++++++------
> tools/perf/util/session.c | 117 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
> tools/perf/util/util.c | 10 ++++--
> 25 files changed, 479 insertions(+), 131 deletions(-)
>

2019-02-04 10:39:38

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

On Mon, Feb 04, 2019 at 01:12:11PM +0300, Alexey Budankov wrote:
>
> Hi,
>
> On 03.02.2019 18:30, Jiri Olsa wrote:
> > hi,
> > this patchset adds the --dir option to record command (and all
> > the other record command that overload cmd_record) that allows
> > the data to be stored in directory with multiple data files.
> >
> > It's next step for multiple threads implementation in record.
> > It's now possible to make directory data via --dir option, like:
> >
> > $ perf record --dir perf bench sched messaging
>
> Is it possible to name data directory differently from perf.data
> e.g. using --output option, like this?
>
> $ perf record --output result_1 --dir perf bench sched messaging


yep, it's taken into account:

[jolsa@krava perf]$ ./perf record --output result_1 --dir ./perf bench sched messaging
Couldn't synthesize bpf events.
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

Total time: 0.177 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.316 MB result_1 (7225 samples) ]

[jolsa@krava perf]$ ll result_1/
total 348
-rw-------. 1 jolsa jolsa 27624 Feb 4 11:35 data.0
-rw-------. 1 jolsa jolsa 56672 Feb 4 11:35 data.1
-rw-------. 1 jolsa jolsa 30824 Feb 4 11:35 data.2
-rw-------. 1 jolsa jolsa 49136 Feb 4 11:35 data.3
-rw-------. 1 jolsa jolsa 22712 Feb 4 11:35 data.4
-rw-------. 1 jolsa jolsa 53392 Feb 4 11:35 data.5
-rw-------. 1 jolsa jolsa 43352 Feb 4 11:35 data.6
-rw-------. 1 jolsa jolsa 46688 Feb 4 11:35 data.7
-rw-------. 1 jolsa jolsa 9068 Feb 4 11:35 header

jirka

2019-02-04 14:00:12

by Alexey Budankov

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

On 04.02.2019 13:36, Jiri Olsa wrote:
> On Mon, Feb 04, 2019 at 01:12:11PM +0300, Alexey Budankov wrote:
>>
>> Hi,
>>
>> On 03.02.2019 18:30, Jiri Olsa wrote:
>>> hi,
>>> this patchset adds the --dir option to record command (and all
>>> the other record command that overload cmd_record) that allows
>>> the data to be stored in directory with multiple data files.
>>>
>>> It's next step for multiple threads implementation in record.
>>> It's now possible to make directory data via --dir option, like:
>>>
>>> $ perf record --dir perf bench sched messaging
>>
>> Is it possible to name data directory differently from perf.data
>> e.g. using --output option, like this?
>>
>> $ perf record --output result_1 --dir perf bench sched messaging
>
>
> yep, it's taken into account:
>
> [jolsa@krava perf]$ ./perf record --output result_1 --dir ./perf bench sched messaging
> Couldn't synthesize bpf events.
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 0.177 [sec]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.316 MB result_1 (7225 samples) ]
>
> [jolsa@krava perf]$ ll result_1/
> total 348
> -rw-------. 1 jolsa jolsa 27624 Feb 4 11:35 data.0
> -rw-------. 1 jolsa jolsa 56672 Feb 4 11:35 data.1
> -rw-------. 1 jolsa jolsa 30824 Feb 4 11:35 data.2
> -rw-------. 1 jolsa jolsa 49136 Feb 4 11:35 data.3
> -rw-------. 1 jolsa jolsa 22712 Feb 4 11:35 data.4
> -rw-------. 1 jolsa jolsa 53392 Feb 4 11:35 data.5
> -rw-------. 1 jolsa jolsa 43352 Feb 4 11:35 data.6
> -rw-------. 1 jolsa jolsa 46688 Feb 4 11:35 data.7
> -rw-------. 1 jolsa jolsa 9068 Feb 4 11:35 header

Awesome. What do you think about having it like this:

$ perf record --output result_1.data ... - writes data to a file

$ perf record --dir result_1 ... - or even
$ perf record --output_dir result_1 ... - writes data into a directory

IMHO, this interface is simpler for a user.

Thanks,
Alexey

>
> jirka
>

2019-02-04 14:03:40

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

On Mon, Feb 04, 2019 at 02:29:56PM +0300, Alexey Budankov wrote:
> On 04.02.2019 13:36, Jiri Olsa wrote:
> > On Mon, Feb 04, 2019 at 01:12:11PM +0300, Alexey Budankov wrote:
> >>
> >> Hi,
> >>
> >> On 03.02.2019 18:30, Jiri Olsa wrote:
> >>> hi,
> >>> this patchset adds the --dir option to record command (and all
> >>> the other record command that overload cmd_record) that allows
> >>> the data to be stored in directory with multiple data files.
> >>>
> >>> It's next step for multiple threads implementation in record.
> >>> It's now possible to make directory data via --dir option, like:
> >>>
> >>> $ perf record --dir perf bench sched messaging
> >>
> >> Is it possible to name data directory differently from perf.data
> >> e.g. using --output option, like this?
> >>
> >> $ perf record --output result_1 --dir perf bench sched messaging
> >
> >
> > yep, it's taken into account:
> >
> > [jolsa@krava perf]$ ./perf record --output result_1 --dir ./perf bench sched messaging
> > Couldn't synthesize bpf events.
> > # Running 'sched/messaging' benchmark:
> > # 20 sender and receiver processes per group
> > # 10 groups == 400 processes run
> >
> > Total time: 0.177 [sec]
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.316 MB result_1 (7225 samples) ]
> >
> > [jolsa@krava perf]$ ll result_1/
> > total 348
> > -rw-------. 1 jolsa jolsa 27624 Feb 4 11:35 data.0
> > -rw-------. 1 jolsa jolsa 56672 Feb 4 11:35 data.1
> > -rw-------. 1 jolsa jolsa 30824 Feb 4 11:35 data.2
> > -rw-------. 1 jolsa jolsa 49136 Feb 4 11:35 data.3
> > -rw-------. 1 jolsa jolsa 22712 Feb 4 11:35 data.4
> > -rw-------. 1 jolsa jolsa 53392 Feb 4 11:35 data.5
> > -rw-------. 1 jolsa jolsa 43352 Feb 4 11:35 data.6
> > -rw-------. 1 jolsa jolsa 46688 Feb 4 11:35 data.7
> > -rw-------. 1 jolsa jolsa 9068 Feb 4 11:35 header
>
> Awesome. What do you think about having it like this:
>
> $ perf record --output result_1.data ... - writes data to a file
>
> $ perf record --dir result_1 ... - or even
> $ perf record --output_dir result_1 ... - writes data into a directory
>
> IMHO, this interface is simpler for a user.

yep, seems more convenient.. I'll add it

thanks,
jirka

2019-02-04 18:57:11

by Stephane Eranian

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

On Mon, Feb 4, 2019 at 3:41 AM Jiri Olsa <[email protected]> wrote:
>
> On Mon, Feb 04, 2019 at 02:29:56PM +0300, Alexey Budankov wrote:
> > On 04.02.2019 13:36, Jiri Olsa wrote:
> > > On Mon, Feb 04, 2019 at 01:12:11PM +0300, Alexey Budankov wrote:
> > >>
> > >> Hi,
> > >>
> > >> On 03.02.2019 18:30, Jiri Olsa wrote:
> > >>> hi,
> > >>> this patchset adds the --dir option to record command (and all
> > >>> the other record command that overload cmd_record) that allows
> > >>> the data to be stored in directory with multiple data files.
> > >>>
> > >>> It's next step for multiple threads implementation in record.
> > >>> It's now possible to make directory data via --dir option, like:
> > >>>
> > >>> $ perf record --dir perf bench sched messaging
> > >>
> > >> Is it possible to name data directory differently from perf.data
> > >> e.g. using --output option, like this?
> > >>
> > >> $ perf record --output result_1 --dir perf bench sched messaging
> > >
> > >
> > > yep, it's taken into account:
> > >
> > > [jolsa@krava perf]$ ./perf record --output result_1 --dir ./perf bench sched messaging
> > > Couldn't synthesize bpf events.
> > > # Running 'sched/messaging' benchmark:
> > > # 20 sender and receiver processes per group
> > > # 10 groups == 400 processes run
> > >
> > > Total time: 0.177 [sec]
> > > [ perf record: Woken up 1 times to write data ]
> > > [ perf record: Captured and wrote 0.316 MB result_1 (7225 samples) ]
> > >
> > > [jolsa@krava perf]$ ll result_1/
> > > total 348
> > > -rw-------. 1 jolsa jolsa 27624 Feb 4 11:35 data.0
> > > -rw-------. 1 jolsa jolsa 56672 Feb 4 11:35 data.1
> > > -rw-------. 1 jolsa jolsa 30824 Feb 4 11:35 data.2
> > > -rw-------. 1 jolsa jolsa 49136 Feb 4 11:35 data.3
> > > -rw-------. 1 jolsa jolsa 22712 Feb 4 11:35 data.4
> > > -rw-------. 1 jolsa jolsa 53392 Feb 4 11:35 data.5
> > > -rw-------. 1 jolsa jolsa 43352 Feb 4 11:35 data.6
> > > -rw-------. 1 jolsa jolsa 46688 Feb 4 11:35 data.7
> > > -rw-------. 1 jolsa jolsa 9068 Feb 4 11:35 header
> >
> > Awesome. What do you think about having it like this:
> >
> > $ perf record --output result_1.data ... - writes data to a file
> >
> > $ perf record --dir result_1 ... - or even
> > $ perf record --output_dir result_1 ... - writes data into a directory
> >
> > IMHO, this interface is simpler for a user.
>
> yep, seems more convenient.. I'll add it
>
But what happens if you do: perf record -o foo --output_dir foo.d?

> thanks,
> jirka

2019-02-04 19:28:22

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

Em Mon, Feb 04, 2019 at 10:56:05AM -0800, Stephane Eranian escreveu:
> On Mon, Feb 4, 2019 at 3:41 AM Jiri Olsa <[email protected]> wrote:
> >
> > On Mon, Feb 04, 2019 at 02:29:56PM +0300, Alexey Budankov wrote:
> > > On 04.02.2019 13:36, Jiri Olsa wrote:
> > > > On Mon, Feb 04, 2019 at 01:12:11PM +0300, Alexey Budankov wrote:
> > > >>
> > > >> Hi,
> > > >>
> > > >> On 03.02.2019 18:30, Jiri Olsa wrote:
> > > >>> hi,
> > > >>> this patchset adds the --dir option to record command (and all
> > > >>> the other record command that overload cmd_record) that allows
> > > >>> the data to be stored in directory with multiple data files.
> > > >>>
> > > >>> It's next step for multiple threads implementation in record.
> > > >>> It's now possible to make directory data via --dir option, like:
> > > >>>
> > > >>> $ perf record --dir perf bench sched messaging
> > > >>
> > > >> Is it possible to name data directory differently from perf.data
> > > >> e.g. using --output option, like this?
> > > >>
> > > >> $ perf record --output result_1 --dir perf bench sched messaging
> > > >
> > > >
> > > > yep, it's taken into account:
> > > >
> > > > [jolsa@krava perf]$ ./perf record --output result_1 --dir ./perf bench sched messaging
> > > > Couldn't synthesize bpf events.
> > > > # Running 'sched/messaging' benchmark:
> > > > # 20 sender and receiver processes per group
> > > > # 10 groups == 400 processes run
> > > >
> > > > Total time: 0.177 [sec]
> > > > [ perf record: Woken up 1 times to write data ]
> > > > [ perf record: Captured and wrote 0.316 MB result_1 (7225 samples) ]
> > > >
> > > > [jolsa@krava perf]$ ll result_1/
> > > > total 348
> > > > -rw-------. 1 jolsa jolsa 27624 Feb 4 11:35 data.0
> > > > -rw-------. 1 jolsa jolsa 56672 Feb 4 11:35 data.1
> > > > -rw-------. 1 jolsa jolsa 30824 Feb 4 11:35 data.2
> > > > -rw-------. 1 jolsa jolsa 49136 Feb 4 11:35 data.3
> > > > -rw-------. 1 jolsa jolsa 22712 Feb 4 11:35 data.4
> > > > -rw-------. 1 jolsa jolsa 53392 Feb 4 11:35 data.5
> > > > -rw-------. 1 jolsa jolsa 43352 Feb 4 11:35 data.6
> > > > -rw-------. 1 jolsa jolsa 46688 Feb 4 11:35 data.7
> > > > -rw-------. 1 jolsa jolsa 9068 Feb 4 11:35 header
> > >
> > > Awesome. What do you think about having it like this:
> > >
> > > $ perf record --output result_1.data ... - writes data to a file
> > >
> > > $ perf record --dir result_1 ... - or even
> > > $ perf record --output_dir result_1 ... - writes data into a directory
> > >
> > > IMHO, this interface is simpler for a user.
> >
> > yep, seems more convenient.. I'll add it
> >
> But what happens if you do: perf record -o foo --output_dir foo.d?

Should fail, i.e. either you use single-file or directory output, I
think.

- Arnaldo

2019-02-04 21:34:28

by Alexey Budankov

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

On 04.02.2019 22:27, Arnaldo Carvalho de Melo wrote:
> Em Mon, Feb 04, 2019 at 10:56:05AM -0800, Stephane Eranian escreveu:
>> On Mon, Feb 4, 2019 at 3:41 AM Jiri Olsa <[email protected]> wrote:
>>>
>>> On Mon, Feb 04, 2019 at 02:29:56PM +0300, Alexey Budankov wrote:
>>>> On 04.02.2019 13:36, Jiri Olsa wrote:
>>>>> On Mon, Feb 04, 2019 at 01:12:11PM +0300, Alexey Budankov wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On 03.02.2019 18:30, Jiri Olsa wrote:
>>>>>>> hi,
>>>>>>> this patchset adds the --dir option to record command (and all
>>>>>>> the other record command that overload cmd_record) that allows
>>>>>>> the data to be stored in directory with multiple data files.
>>>>>>>
>>>>>>> It's next step for multiple threads implementation in record.
>>>>>>> It's now possible to make directory data via --dir option, like:
>>>>>>>
>>>>>>> $ perf record --dir perf bench sched messaging
>>>>>>
>>>>>> Is it possible to name data directory differently from perf.data
>>>>>> e.g. using --output option, like this?
>>>>>>
>>>>>> $ perf record --output result_1 --dir perf bench sched messaging
>>>>>
>>>>>
>>>>> yep, it's taken into account:
>>>>>
>>>>> [jolsa@krava perf]$ ./perf record --output result_1 --dir ./perf bench sched messaging
>>>>> Couldn't synthesize bpf events.
>>>>> # Running 'sched/messaging' benchmark:
>>>>> # 20 sender and receiver processes per group
>>>>> # 10 groups == 400 processes run
>>>>>
>>>>> Total time: 0.177 [sec]
>>>>> [ perf record: Woken up 1 times to write data ]
>>>>> [ perf record: Captured and wrote 0.316 MB result_1 (7225 samples) ]
>>>>>
>>>>> [jolsa@krava perf]$ ll result_1/
>>>>> total 348
>>>>> -rw-------. 1 jolsa jolsa 27624 Feb 4 11:35 data.0
>>>>> -rw-------. 1 jolsa jolsa 56672 Feb 4 11:35 data.1
>>>>> -rw-------. 1 jolsa jolsa 30824 Feb 4 11:35 data.2
>>>>> -rw-------. 1 jolsa jolsa 49136 Feb 4 11:35 data.3
>>>>> -rw-------. 1 jolsa jolsa 22712 Feb 4 11:35 data.4
>>>>> -rw-------. 1 jolsa jolsa 53392 Feb 4 11:35 data.5
>>>>> -rw-------. 1 jolsa jolsa 43352 Feb 4 11:35 data.6
>>>>> -rw-------. 1 jolsa jolsa 46688 Feb 4 11:35 data.7
>>>>> -rw-------. 1 jolsa jolsa 9068 Feb 4 11:35 header
>>>>
>>>> Awesome. What do you think about having it like this:
>>>>
>>>> $ perf record --output result_1.data ... - writes data to a file
>>>>
>>>> $ perf record --dir result_1 ... - or even
>>>> $ perf record --output_dir result_1 ... - writes data into a directory
>>>>
>>>> IMHO, this interface is simpler for a user.
>>>
>>> yep, seems more convenient.. I'll add it
>>>
>> But what happens if you do: perf record -o foo --output_dir foo.d?
>
> Should fail, i.e. either you use single-file or directory output, I
> think.

I also expect it this way.

Alexey

>
> - Arnaldo
>

2019-02-04 21:40:55

by Stephane Eranian

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

On Mon, Feb 4, 2019 at 11:27 AM Arnaldo Carvalho de Melo
<[email protected]> wrote:
>
> Em Mon, Feb 04, 2019 at 10:56:05AM -0800, Stephane Eranian escreveu:
> > On Mon, Feb 4, 2019 at 3:41 AM Jiri Olsa <[email protected]> wrote:
> > >
> > > On Mon, Feb 04, 2019 at 02:29:56PM +0300, Alexey Budankov wrote:
> > > > On 04.02.2019 13:36, Jiri Olsa wrote:
> > > > > On Mon, Feb 04, 2019 at 01:12:11PM +0300, Alexey Budankov wrote:
> > > > >>
> > > > >> Hi,
> > > > >>
> > > > >> On 03.02.2019 18:30, Jiri Olsa wrote:
> > > > >>> hi,
> > > > >>> this patchset adds the --dir option to record command (and all
> > > > >>> the other record command that overload cmd_record) that allows
> > > > >>> the data to be stored in directory with multiple data files.
> > > > >>>
> > > > >>> It's next step for multiple threads implementation in record.
> > > > >>> It's now possible to make directory data via --dir option, like:
> > > > >>>
> > > > >>> $ perf record --dir perf bench sched messaging
> > > > >>
> > > > >> Is it possible to name data directory differently from perf.data
> > > > >> e.g. using --output option, like this?
> > > > >>
> > > > >> $ perf record --output result_1 --dir perf bench sched messaging
> > > > >
> > > > >
> > > > > yep, it's taken into account:
> > > > >
> > > > > [jolsa@krava perf]$ ./perf record --output result_1 --dir ./perf bench sched messaging
> > > > > Couldn't synthesize bpf events.
> > > > > # Running 'sched/messaging' benchmark:
> > > > > # 20 sender and receiver processes per group
> > > > > # 10 groups == 400 processes run
> > > > >
> > > > > Total time: 0.177 [sec]
> > > > > [ perf record: Woken up 1 times to write data ]
> > > > > [ perf record: Captured and wrote 0.316 MB result_1 (7225 samples) ]
> > > > >
> > > > > [jolsa@krava perf]$ ll result_1/
> > > > > total 348
> > > > > -rw-------. 1 jolsa jolsa 27624 Feb 4 11:35 data.0
> > > > > -rw-------. 1 jolsa jolsa 56672 Feb 4 11:35 data.1
> > > > > -rw-------. 1 jolsa jolsa 30824 Feb 4 11:35 data.2
> > > > > -rw-------. 1 jolsa jolsa 49136 Feb 4 11:35 data.3
> > > > > -rw-------. 1 jolsa jolsa 22712 Feb 4 11:35 data.4
> > > > > -rw-------. 1 jolsa jolsa 53392 Feb 4 11:35 data.5
> > > > > -rw-------. 1 jolsa jolsa 43352 Feb 4 11:35 data.6
> > > > > -rw-------. 1 jolsa jolsa 46688 Feb 4 11:35 data.7
> > > > > -rw-------. 1 jolsa jolsa 9068 Feb 4 11:35 header
> > > >
> > > > Awesome. What do you think about having it like this:
> > > >
> > > > $ perf record --output result_1.data ... - writes data to a file
> > > >
> > > > $ perf record --dir result_1 ... - or even
> > > > $ perf record --output_dir result_1 ... - writes data into a directory
> > > >
> > > > IMHO, this interface is simpler for a user.
> > >
> > > yep, seems more convenient.. I'll add it
> > >
> > But what happens if you do: perf record -o foo --output_dir foo.d?
>
> Should fail, i.e. either you use single-file or directory output, I
> think.
>
Agreed
> - Arnaldo

2019-02-04 21:47:57

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

On Mon, Feb 04, 2019 at 12:05:38PM -0800, Stephane Eranian wrote:

SNIP

> > > > > >
> > > > > > [jolsa@krava perf]$ ll result_1/
> > > > > > total 348
> > > > > > -rw-------. 1 jolsa jolsa 27624 Feb 4 11:35 data.0
> > > > > > -rw-------. 1 jolsa jolsa 56672 Feb 4 11:35 data.1
> > > > > > -rw-------. 1 jolsa jolsa 30824 Feb 4 11:35 data.2
> > > > > > -rw-------. 1 jolsa jolsa 49136 Feb 4 11:35 data.3
> > > > > > -rw-------. 1 jolsa jolsa 22712 Feb 4 11:35 data.4
> > > > > > -rw-------. 1 jolsa jolsa 53392 Feb 4 11:35 data.5
> > > > > > -rw-------. 1 jolsa jolsa 43352 Feb 4 11:35 data.6
> > > > > > -rw-------. 1 jolsa jolsa 46688 Feb 4 11:35 data.7
> > > > > > -rw-------. 1 jolsa jolsa 9068 Feb 4 11:35 header
> > > > >
> > > > > Awesome. What do you think about having it like this:
> > > > >
> > > > > $ perf record --output result_1.data ... - writes data to a file
> > > > >
> > > > > $ perf record --dir result_1 ... - or even
> > > > > $ perf record --output_dir result_1 ... - writes data into a directory
> > > > >
> > > > > IMHO, this interface is simpler for a user.
> > > >
> > > > yep, seems more convenient.. I'll add it
> > > >
> > > But what happens if you do: perf record -o foo --output_dir foo.d?
> >
> > Should fail, i.e. either you use single-file or directory output, I
> > think.
> >
> Agreed

ok, will do that

thanks,
jirka

2019-02-04 23:18:20

by Stephane Eranian

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

Jiri,

While you're looking at the output format, I think it would be good
time to simplify the code handling perf.data file.
Today, perf record can emit in two formats: file mode or pipe mode.
This adds complexity in the code and
is error prone as the file mode path is tested more than the pipe mode
path. We have run into multiple issues with
the pipe mode in recent years. There is no real reason why we need to
maintain two formats. If I recall, the pipe format
was introduced because on pipes you cannot lseek to update the headers
and therefore some of the information present as tables
updated on the fly needed to be generated as pseudo records by the
tool. I believe that the pipe format covers all the needs and could
supersede the file mode format. That would simplify code in perf
record and eliminate the risk of errors when new headers
are introduced.

Related to your effort to make perf record multi-threaded, I was
wondering how this would impact pipe mode.
Could you explain?

Thanks.


On Mon, Feb 4, 2019 at 12:28 PM Jiri Olsa <[email protected]> wrote:
>
> On Mon, Feb 04, 2019 at 12:05:38PM -0800, Stephane Eranian wrote:
>
> SNIP
>
> > > > > > >
> > > > > > > [jolsa@krava perf]$ ll result_1/
> > > > > > > total 348
> > > > > > > -rw-------. 1 jolsa jolsa 27624 Feb 4 11:35 data.0
> > > > > > > -rw-------. 1 jolsa jolsa 56672 Feb 4 11:35 data.1
> > > > > > > -rw-------. 1 jolsa jolsa 30824 Feb 4 11:35 data.2
> > > > > > > -rw-------. 1 jolsa jolsa 49136 Feb 4 11:35 data.3
> > > > > > > -rw-------. 1 jolsa jolsa 22712 Feb 4 11:35 data.4
> > > > > > > -rw-------. 1 jolsa jolsa 53392 Feb 4 11:35 data.5
> > > > > > > -rw-------. 1 jolsa jolsa 43352 Feb 4 11:35 data.6
> > > > > > > -rw-------. 1 jolsa jolsa 46688 Feb 4 11:35 data.7
> > > > > > > -rw-------. 1 jolsa jolsa 9068 Feb 4 11:35 header
> > > > > >
> > > > > > Awesome. What do you think about having it like this:
> > > > > >
> > > > > > $ perf record --output result_1.data ... - writes data to a file
> > > > > >
> > > > > > $ perf record --dir result_1 ... - or even
> > > > > > $ perf record --output_dir result_1 ... - writes data into a directory
> > > > > >
> > > > > > IMHO, this interface is simpler for a user.
> > > > >
> > > > > yep, seems more convenient.. I'll add it
> > > > >
> > > > But what happens if you do: perf record -o foo --output_dir foo.d?
> > >
> > > Should fail, i.e. either you use single-file or directory output, I
> > > think.
> > >
> > Agreed
>
> ok, will do that
>
> thanks,
> jirka

2019-02-05 11:41:18

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH 01/14] perf tools: Make rm_rf to remove single file


On 03.02.2019 18:30, Jiri Olsa wrote:
> Let rm_rf remove file if it's provided by path.
>
> Link: http://lkml.kernel.org/n/[email protected]
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> tools/perf/util/util.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
> index 320b0fef249a..58b8d6a8bfbc 100644
> --- a/tools/perf/util/util.c
> +++ b/tools/perf/util/util.c
> @@ -125,8 +125,14 @@ int rm_rf(const char *path)
> char namebuf[PATH_MAX];
>
> dir = opendir(path);
> - if (dir == NULL)
> - return 0;
> + if (dir == NULL) {
> + /*
> + * The path does not exist or is not directory,
> + * so there's no harm to try remove it. This way
> + * rm_rf will work over single file.
> + */

This can also happen due to lack of fds or memory.
Not sure file still has to be deleted in these cases.

- Alexey

> + return unlink(path);
> + }
>
> while ((d = readdir(dir)) != NULL && !ret) {
> struct stat statbuf;
>

2019-02-05 11:53:47

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH 06/14] perf data: Add perf_data__(create_dir|free_dir) functions


On 03.02.2019 18:30, Jiri Olsa wrote:
> Adding perf_data__create_dir to create nr files inside
> struct perf_data path directory:
> int perf_data__create_dir(struct perf_data *data, int nr);
>
> and function to free that data:
> void perf_data__free_dir(struct perf_data *data);
>
> Link: http://lkml.kernel.org/n/[email protected]
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> tools/perf/util/data.c | 47 ++++++++++++++++++++++++++++++++++++++++++
> tools/perf/util/data.h | 8 +++++++
> 2 files changed, 55 insertions(+)
>
> diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
> index 0a3051cc0ea0..ff1d9e5bd68d 100644
> --- a/tools/perf/util/data.c
> +++ b/tools/perf/util/data.c
> @@ -7,11 +7,58 @@
> #include <fcntl.h>
> #include <unistd.h>
> #include <string.h>
> +#include <asm/bug.h>
>
> #include "data.h"
> #include "util.h"
> #include "debug.h"
>
> +static void free_dir(struct perf_data_file *files, int nr)
> +{
> + while (--nr >= 1) {
> + close(files[nr].fd);
> + free(files[nr].path);
> + }
> + free(files);

It implements closing of created files and frees corresponding memory.
However it doesn't delete the files what looks like the proper rollback
from perf_data__create_dir() in case of some open() failure.

> +}
> +
> +void perf_data__free_dir(struct perf_data *data)
> +{
> + free_dir(data->dir.files, data->dir.nr);
> +}
> +
> +int perf_data__create_dir(struct perf_data *data, int nr)
> +{
> + struct perf_data_file *files = NULL;
> + int i, ret = -1;
> +
> + files = malloc(nr * sizeof(*files));
> + if (!files)
> + return -ENOMEM;
> +
> + data->dir.files = files;
> + data->dir.nr = nr;
> +
> + for (i = 0; i < nr; i++) {
> + struct perf_data_file *file = &files[i];
> +
> + if (asprintf(&file->path, "%s/data.%d", data->path, i) < 0)
> + goto out_err;
> +
> + ret = open(file->path, O_RDWR|O_CREAT|O_TRUNC, S_IRUSR|S_IWUSR);
> + if (ret < 0)
> + goto out_err;
> +
> + file->fd = ret;
> + }
> +
> + return 0;
> +
> +out_err:
> + free_dir(files, i);

It looks like this is more unlink(dir) than close_files_in_dir().

Alexey

> + return ret;
> +}
> +
> static bool check_pipe(struct perf_data *data)
> {
> struct stat st;
> diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h
> index 2bce28117ccf..3b4115dc777f 100644
> --- a/tools/perf/util/data.h
> +++ b/tools/perf/util/data.h
> @@ -21,6 +21,11 @@ struct perf_data {
> bool is_pipe;
> bool force;
> enum perf_data_mode mode;
> +
> + struct {
> + struct perf_data_file *files;
> + int nr;
> + } dir;
> };
>
> static inline bool perf_data__is_read(struct perf_data *data)
> @@ -64,4 +69,7 @@ ssize_t perf_data_file__write(struct perf_data_file *file,
> int perf_data__switch(struct perf_data *data,
> const char *postfix,
> size_t pos, bool at_exit);
> +
> +int perf_data__create_dir(struct perf_data *data, int nr);
> +void perf_data__free_dir(struct perf_data *data);
> #endif /* __PERF_DATA_H */
>

2019-02-05 12:37:21

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH 14/14] perf record: Add --dir option to store data in directory


On 03.02.2019 18:30, Jiri Olsa wrote:
> Adding --dir option to store data in directory. It's next
> step for multiple threads in record. It's not possible
> to make directory data via --dir option, like:
>
> $ perf record --dir perf bench sched messaging
> $ ls -l perf.data
> total 344
> -rw-------. 1 jolsa jolsa 43864 Jan 20 22:26 data.0
> -rw-------. 1 jolsa jolsa 30464 Jan 20 22:26 data.1
> -rw-------. 1 jolsa jolsa 53816 Jan 20 22:26 data.2
> -rw-------. 1 jolsa jolsa 30368 Jan 20 22:26 data.3
> -rw-------. 1 jolsa jolsa 40088 Jan 20 22:26 data.4
> -rw-------. 1 jolsa jolsa 42592 Jan 20 22:26 data.5
> -rw-------. 1 jolsa jolsa 56136 Jan 20 22:26 data.6
> -rw-------. 1 jolsa jolsa 25992 Jan 20 22:26 data.7
> -rw-------. 1 jolsa jolsa 8832 Jan 20 22:26 header
>
> There's a data file created for every cpu and it's storing
> data for those cpu maps.
>
> It's possible to transform directory data into standard
> perf.data file via following inject command:
>
> $ perf inject -o perf.data.file -i perf.data
>
> Link: http://lkml.kernel.org/n/[email protected]
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> tools/perf/Documentation/perf-record.txt | 3 ++
> tools/perf/builtin-record.c | 59 ++++++++++++++++++++++--
> tools/perf/util/mmap.h | 23 ++++-----
> 3 files changed, 70 insertions(+), 15 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
> index d232b13ea713..8dcdc8cabcad 100644
> --- a/tools/perf/Documentation/perf-record.txt
> +++ b/tools/perf/Documentation/perf-record.txt
> @@ -505,6 +505,9 @@ config terms. For example: 'cycles/overwrite/' and 'instructions/no-overwrite/'.
>
> Implies --tail-synthesize.
>
> +--dir::
> +Store data into directory with one data file for cpu.
> +

Makes sense to mention compatibility with -o option and per-thread buffer mapping.

> SEE ALSO
> --------
> linkperf:perf-stat[1], linkperf:perf-list[1]
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index cd02ab3ec4ff..87e39b9cc7bd 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -111,17 +111,21 @@ static bool switch_output_time(struct record *rec)
> trigger_is_ready(&switch_output_trigger);
> }
>
> -static int record__write(struct record *rec, struct perf_mmap *map __maybe_unused,
> +static int record__write(struct record *rec, struct perf_mmap *map,
> void *bf, size_t size)
> {
> - struct perf_data_file *file = &rec->session->data->file;
> + struct perf_data_file *file = &rec->data.file;
> +
> + if (map && map->file)
> + file = map->file;

For AIO mode per-cpu streaming could be done in parallel because Posix
AIO API uses a separate thread for every open data.# or header fd.

>
> if (perf_data_file__write(file, bf, size) < 0) {
> pr_err("failed to write perf data, error: %m\n");
> return -1;
> }
>
> - rec->bytes_written += size;
> + if (file == &rec->data.file)
> + rec->bytes_written += size;

switch-output logic now tracks header file size only?
If so, it probably needs to be corrected or simply disabled
for --output_dir mode.

Alexey

>
> if (switch_output_size(rec))
> trigger_hit(&switch_output_trigger);
> @@ -563,6 +567,25 @@ static int record__mmap_evlist(struct record *rec,
> return 0;
> }
>
> +static int record__mmap_dir_data(struct record *rec)
> +{
> + struct perf_evlist *evlist = rec->evlist;
> + struct perf_data *data = &rec->data;
> + int i, ret, nr = evlist->nr_mmaps;
> +
> + ret = perf_data__create_dir(data, nr);
> + if (ret)
> + return ret;
> +
> + for (i = 0; i < nr; i++) {
> + struct perf_mmap *map = &evlist->mmap[i];
> +
> + map->file = &data->dir.files[i];
> + }
> +
> + return 0;
> +}
> +
> static int record__mmap(struct record *rec)
> {
> return record__mmap_evlist(rec, rec->evlist);
> @@ -792,8 +815,12 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
> /*
> * Mark the round finished in case we wrote
> * at least one event.
> + *
> + * No need for round events in directory mode,
> + * because per-cpu files/maps have sorted data
> + * from kernel.
> */
> - if (bytes_written != rec->bytes_written)
> + if (!perf_data__is_dir(&rec->data) && bytes_written != rec->bytes_written)
> rc = record__write(rec, NULL, &finished_round_event, sizeof(finished_round_event));
>
> if (overwrite)
> @@ -851,6 +878,9 @@ record__finish_output(struct record *rec)
> rec->session->header.data_size += rec->bytes_written;
> data->file.size = lseek(perf_data__fd(data), 0, SEEK_CUR);
>
> + if (perf_data__is_dir(data))
> + perf_data__update_dir(data);
> +
> if (!rec->no_buildid) {
> process_buildids(rec);
>
> @@ -1170,11 +1200,23 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
> if (data->is_pipe && rec->evlist->nr_entries == 1)
> rec->opts.sample_id = true;
>
> + if (data->is_pipe && perf_data__is_dir(data)) {
> + pr_err("Directory output is not allowed for pipe output\n");
> + err = -1;
> + goto out_child;
> + }
> +
> if (record__open(rec) != 0) {
> err = -1;
> goto out_child;
> }
>
> + if (perf_data__is_dir(data)) {
> + err = record__mmap_dir_data(rec);
> + if (err)
> + goto out_child;
> + }
> +
> err = bpf__apply_obj_config();
> if (err) {
> char errbuf[BUFSIZ];
> @@ -1962,6 +2004,8 @@ static struct option __record_options[] = {
> &nr_cblocks_default, "n", "Use <n> control blocks in asynchronous trace writing mode (default: 1, max: 4)",
> record__aio_parse),
> #endif
> + OPT_BOOLEAN(0, "dir", &record.data.is_dir,
> + "Store data into directory perf.data"),
> OPT_END()
> };
>
> @@ -2113,6 +2157,13 @@ int cmd_record(int argc, const char **argv)
> goto out;
> }
>
> + if (perf_data__is_dir(&rec->data)) {
> + if (!rec->opts.sample_time) {
> + pr_err("Sample timestamp is required for indexing\n");
> + goto out;
> + }
> + }
> +
> if (rec->opts.target.tid && !rec->opts.no_inherit_set)
> rec->opts.no_inherit = true;
>
> diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
> index e566c19b242b..3e8595a8d6ce 100644
> --- a/tools/perf/util/mmap.h
> +++ b/tools/perf/util/mmap.h
> @@ -19,17 +19,18 @@ struct aiocb;
> * @refcnt - e.g. code using PERF_EVENT_IOC_SET_OUTPUT to share this
> */
> struct perf_mmap {
> - void *base;
> - int mask;
> - int fd;
> - int cpu;
> - refcount_t refcnt;
> - u64 prev;
> - u64 start;
> - u64 end;
> - bool overwrite;
> - struct auxtrace_mmap auxtrace_mmap;
> - char event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
> + void *base;
> + int mask;
> + int fd;
> + int cpu;
> + refcount_t refcnt;
> + u64 prev;
> + u64 start;
> + u64 end;
> + bool overwrite;
> + struct auxtrace_mmap auxtrace_mmap;
> + struct perf_data_file *file;
> + char event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
> #ifdef HAVE_AIO_SUPPORT
> struct {
> void **data;
>

2019-02-05 13:45:24

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

On Mon, Feb 04, 2019 at 02:44:37PM -0800, Stephane Eranian wrote:
> Jiri,
>
> While you're looking at the output format, I think it would be good
> time to simplify the code handling perf.data file.
> Today, perf record can emit in two formats: file mode or pipe mode.
> This adds complexity in the code and
> is error prone as the file mode path is tested more than the pipe mode
> path. We have run into multiple issues with
> the pipe mode in recent years. There is no real reason why we need to
> maintain two formats. If I recall, the pipe format
> was introduced because on pipes you cannot lseek to update the headers
> and therefore some of the information present as tables
> updated on the fly needed to be generated as pseudo records by the
> tool. I believe that the pipe format covers all the needs and could
> supersede the file mode format. That would simplify code in perf
> record and eliminate the risk of errors when new headers
> are introduced.

yep, I think we have almost all the features covered for pipe mode,
and we have all necessary events to describe events features

so with some effort we could switch off the superfluos file header
and use only events to describe events ;-) make sense, I'll check
on it

>
> Related to your effort to make perf record multi-threaded, I was
> wondering how this would impact pipe mode.
> Could you explain?

there's no support for threaded pipe processing at the moment,
currently the data are stored in directory:

$ ls -l perf.data
total 344
-rw-------. 1 jolsa jolsa 43864 Jan 20 22:26 data.0
-rw-------. 1 jolsa jolsa 30464 Jan 20 22:26 data.1
-rw-------. 1 jolsa jolsa 53816 Jan 20 22:26 data.2
-rw-------. 1 jolsa jolsa 30368 Jan 20 22:26 data.3
-rw-------. 1 jolsa jolsa 40088 Jan 20 22:26 data.4
-rw-------. 1 jolsa jolsa 42592 Jan 20 22:26 data.5
-rw-------. 1 jolsa jolsa 56136 Jan 20 22:26 data.6
-rw-------. 1 jolsa jolsa 25992 Jan 20 22:26 data.7

^^^^ those are raw data files, contains only events with common header
as stored by perf or kernel

-rw-------. 1 jolsa jolsa 8832 Jan 20 22:26 header

^^^^ this one currently holds perf.data file header, describing events
and features

if we switched to pipe mode by default we could omit the header file
and find a way to push the data streams through single file, like with
a new event describing the data stream.. we can have an option for that
so you could do something like:

# perf record --threads --single-file -a ... | perf report -i -

however having single thread storing storing into single file without
any other processing is important on record side (for minimal overhead),
so I think we should keep creating the directory with data files also
for pipe mode to have minimal overhead

jirka

2019-02-05 13:47:08

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 01/14] perf tools: Make rm_rf to remove single file

On Tue, Feb 05, 2019 at 02:33:06PM +0300, Alexey Budankov wrote:
>
> On 03.02.2019 18:30, Jiri Olsa wrote:
> > Let rm_rf remove file if it's provided by path.
> >
> > Link: http://lkml.kernel.org/n/[email protected]
> > Signed-off-by: Jiri Olsa <[email protected]>
> > ---
> > tools/perf/util/util.c | 10 ++++++++--
> > 1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
> > index 320b0fef249a..58b8d6a8bfbc 100644
> > --- a/tools/perf/util/util.c
> > +++ b/tools/perf/util/util.c
> > @@ -125,8 +125,14 @@ int rm_rf(const char *path)
> > char namebuf[PATH_MAX];
> >
> > dir = opendir(path);
> > - if (dir == NULL)
> > - return 0;
> > + if (dir == NULL) {
> > + /*
> > + * The path does not exist or is not directory,
> > + * so there's no harm to try remove it. This way
> > + * rm_rf will work over single file.
> > + */
>
> This can also happen due to lack of fds or memory.
> Not sure file still has to be deleted in these cases.

ok, will do proper error check

thanks,
jirka

>
> - Alexey
>
> > + return unlink(path);
> > + }
> >
> > while ((d = readdir(dir)) != NULL && !ret) {
> > struct stat statbuf;
> >

2019-02-05 13:48:58

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 06/14] perf data: Add perf_data__(create_dir|free_dir) functions

On Tue, Feb 05, 2019 at 02:52:58PM +0300, Alexey Budankov wrote:
>
> On 03.02.2019 18:30, Jiri Olsa wrote:
> > Adding perf_data__create_dir to create nr files inside
> > struct perf_data path directory:
> > int perf_data__create_dir(struct perf_data *data, int nr);
> >
> > and function to free that data:
> > void perf_data__free_dir(struct perf_data *data);
> >
> > Link: http://lkml.kernel.org/n/[email protected]
> > Signed-off-by: Jiri Olsa <[email protected]>
> > ---
> > tools/perf/util/data.c | 47 ++++++++++++++++++++++++++++++++++++++++++
> > tools/perf/util/data.h | 8 +++++++
> > 2 files changed, 55 insertions(+)
> >
> > diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
> > index 0a3051cc0ea0..ff1d9e5bd68d 100644
> > --- a/tools/perf/util/data.c
> > +++ b/tools/perf/util/data.c
> > @@ -7,11 +7,58 @@
> > #include <fcntl.h>
> > #include <unistd.h>
> > #include <string.h>
> > +#include <asm/bug.h>
> >
> > #include "data.h"
> > #include "util.h"
> > #include "debug.h"
> >
> > +static void free_dir(struct perf_data_file *files, int nr)
> > +{
> > + while (--nr >= 1) {
> > + close(files[nr].fd);
> > + free(files[nr].path);
> > + }
> > + free(files);
>
> It implements closing of created files and frees corresponding memory.
> However it doesn't delete the files what looks like the proper rollback
> from perf_data__create_dir() in case of some open() failure.

we don't di rollback event for the single file:

[jolsa@krava perf]$ ./perf record -b ls
Error:
You may not have permission to collect stats.

Consider tweaking /proc/sys/kernel/perf_event_paranoid,
which controls use of the performance events system by
unprivileged users (without CAP_SYS_ADMIN).

The current value is 2:

-1: Allow use of (almost) all events by all users
Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
Disallow raw tracepoint access by users without CAP_SYS_ADMIN
>= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
>= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN

To make this setting permanent, edit /etc/sysctl.conf too, e.g.:

kernel.perf_event_paranoid = -1

[jolsa@krava perf]$ ll perf.data
-rw-------. 1 jolsa jolsa 0 Feb 5 14:40 perf.data

there're multiple points where we could fail during the processing
and fail record command with unfinished perf.data, but yes, I think
we should address that and have some cleanup, I'll check

thanks,
jirka

2019-02-05 13:50:53

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 06/14] perf data: Add perf_data__(create_dir|free_dir) functions

Em Sun, Feb 03, 2019 at 04:30:10PM +0100, Jiri Olsa escreveu:
> Adding perf_data__create_dir to create nr files inside
> struct perf_data path directory:
> int perf_data__create_dir(struct perf_data *data, int nr);
>
> and function to free that data:
> void perf_data__free_dir(struct perf_data *data);

Can you please rename this to perf_data__remove_dir(), I think it is
more appropriate, just like that rm_rf() thing.

- Arnaldo

> Link: http://lkml.kernel.org/n/[email protected]
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> tools/perf/util/data.c | 47 ++++++++++++++++++++++++++++++++++++++++++
> tools/perf/util/data.h | 8 +++++++
> 2 files changed, 55 insertions(+)
>
> diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
> index 0a3051cc0ea0..ff1d9e5bd68d 100644
> --- a/tools/perf/util/data.c
> +++ b/tools/perf/util/data.c
> @@ -7,11 +7,58 @@
> #include <fcntl.h>
> #include <unistd.h>
> #include <string.h>
> +#include <asm/bug.h>
>
> #include "data.h"
> #include "util.h"
> #include "debug.h"
>
> +static void free_dir(struct perf_data_file *files, int nr)
> +{
> + while (--nr >= 1) {
> + close(files[nr].fd);
> + free(files[nr].path);
> + }
> + free(files);
> +}
> +
> +void perf_data__free_dir(struct perf_data *data)
> +{
> + free_dir(data->dir.files, data->dir.nr);
> +}
> +
> +int perf_data__create_dir(struct perf_data *data, int nr)
> +{
> + struct perf_data_file *files = NULL;
> + int i, ret = -1;
> +
> + files = malloc(nr * sizeof(*files));
> + if (!files)
> + return -ENOMEM;
> +
> + data->dir.files = files;
> + data->dir.nr = nr;
> +
> + for (i = 0; i < nr; i++) {
> + struct perf_data_file *file = &files[i];
> +
> + if (asprintf(&file->path, "%s/data.%d", data->path, i) < 0)
> + goto out_err;
> +
> + ret = open(file->path, O_RDWR|O_CREAT|O_TRUNC, S_IRUSR|S_IWUSR);
> + if (ret < 0)
> + goto out_err;
> +
> + file->fd = ret;
> + }
> +
> + return 0;
> +
> +out_err:
> + free_dir(files, i);
> + return ret;
> +}
> +
> static bool check_pipe(struct perf_data *data)
> {
> struct stat st;
> diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h
> index 2bce28117ccf..3b4115dc777f 100644
> --- a/tools/perf/util/data.h
> +++ b/tools/perf/util/data.h
> @@ -21,6 +21,11 @@ struct perf_data {
> bool is_pipe;
> bool force;
> enum perf_data_mode mode;
> +
> + struct {
> + struct perf_data_file *files;
> + int nr;
> + } dir;
> };
>
> static inline bool perf_data__is_read(struct perf_data *data)
> @@ -64,4 +69,7 @@ ssize_t perf_data_file__write(struct perf_data_file *file,
> int perf_data__switch(struct perf_data *data,
> const char *postfix,
> size_t pos, bool at_exit);
> +
> +int perf_data__create_dir(struct perf_data *data, int nr);
> +void perf_data__free_dir(struct perf_data *data);
> #endif /* __PERF_DATA_H */
> --
> 2.17.2

--

- Arnaldo

2019-02-05 13:52:44

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 14/14] perf record: Add --dir option to store data in directory

On Tue, Feb 05, 2019 at 03:36:24PM +0300, Alexey Budankov wrote:
>
> On 03.02.2019 18:30, Jiri Olsa wrote:
> > Adding --dir option to store data in directory. It's next
> > step for multiple threads in record. It's not possible
> > to make directory data via --dir option, like:
> >
> > $ perf record --dir perf bench sched messaging
> > $ ls -l perf.data
> > total 344
> > -rw-------. 1 jolsa jolsa 43864 Jan 20 22:26 data.0
> > -rw-------. 1 jolsa jolsa 30464 Jan 20 22:26 data.1
> > -rw-------. 1 jolsa jolsa 53816 Jan 20 22:26 data.2
> > -rw-------. 1 jolsa jolsa 30368 Jan 20 22:26 data.3
> > -rw-------. 1 jolsa jolsa 40088 Jan 20 22:26 data.4
> > -rw-------. 1 jolsa jolsa 42592 Jan 20 22:26 data.5
> > -rw-------. 1 jolsa jolsa 56136 Jan 20 22:26 data.6
> > -rw-------. 1 jolsa jolsa 25992 Jan 20 22:26 data.7
> > -rw-------. 1 jolsa jolsa 8832 Jan 20 22:26 header
> >
> > There's a data file created for every cpu and it's storing
> > data for those cpu maps.
> >
> > It's possible to transform directory data into standard
> > perf.data file via following inject command:
> >
> > $ perf inject -o perf.data.file -i perf.data
> >
> > Link: http://lkml.kernel.org/n/[email protected]
> > Signed-off-by: Jiri Olsa <[email protected]>
> > ---
> > tools/perf/Documentation/perf-record.txt | 3 ++
> > tools/perf/builtin-record.c | 59 ++++++++++++++++++++++--
> > tools/perf/util/mmap.h | 23 ++++-----
> > 3 files changed, 70 insertions(+), 15 deletions(-)
> >
> > diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
> > index d232b13ea713..8dcdc8cabcad 100644
> > --- a/tools/perf/Documentation/perf-record.txt
> > +++ b/tools/perf/Documentation/perf-record.txt
> > @@ -505,6 +505,9 @@ config terms. For example: 'cycles/overwrite/' and 'instructions/no-overwrite/'.
> >
> > Implies --tail-synthesize.
> >
> > +--dir::
> > +Store data into directory with one data file for cpu.
> > +
>
> Makes sense to mention compatibility with -o option and per-thread buffer mapping.

right, will do

>
> > SEE ALSO
> > --------
> > linkperf:perf-stat[1], linkperf:perf-list[1]
> > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > index cd02ab3ec4ff..87e39b9cc7bd 100644
> > --- a/tools/perf/builtin-record.c
> > +++ b/tools/perf/builtin-record.c
> > @@ -111,17 +111,21 @@ static bool switch_output_time(struct record *rec)
> > trigger_is_ready(&switch_output_trigger);
> > }
> >
> > -static int record__write(struct record *rec, struct perf_mmap *map __maybe_unused,
> > +static int record__write(struct record *rec, struct perf_mmap *map,
> > void *bf, size_t size)
> > {
> > - struct perf_data_file *file = &rec->session->data->file;
> > + struct perf_data_file *file = &rec->data.file;
> > +
> > + if (map && map->file)
> > + file = map->file;
>
> For AIO mode per-cpu streaming could be done in parallel because Posix
> AIO API uses a separate thread for every open data.# or header fd.

yes, I was thinking of that and I plan to try that to see
if there's some benefit

>
> >
> > if (perf_data_file__write(file, bf, size) < 0) {
> > pr_err("failed to write perf data, error: %m\n");
> > return -1;
> > }
> >
> > - rec->bytes_written += size;
> > + if (file == &rec->data.file)
> > + rec->bytes_written += size;
>
> switch-output logic now tracks header file size only?
> If so, it probably needs to be corrected or simply disabled
> for --output_dir mode.

the rec->bytes_written value is used to track header file size,
which is used to fill in the header data section in here
record__finish_output, we can't mix other file sizes into it

the switch-output option still needs some changes to work
over the directory data

jirka

2019-02-05 13:54:35

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 06/14] perf data: Add perf_data__(create_dir|free_dir) functions

On Tue, Feb 05, 2019 at 10:46:07AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Sun, Feb 03, 2019 at 04:30:10PM +0100, Jiri Olsa escreveu:
> > Adding perf_data__create_dir to create nr files inside
> > struct perf_data path directory:
> > int perf_data__create_dir(struct perf_data *data, int nr);
> >
> > and function to free that data:
> > void perf_data__free_dir(struct perf_data *data);
>
> Can you please rename this to perf_data__remove_dir(), I think it is
> more appropriate, just like that rm_rf() thing.

ok, I will check on the rollback in the new version

jirka

2019-02-11 10:20:50

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

On Tue, Feb 05, 2019 at 02:37:27PM +0100, Jiri Olsa wrote:
> On Mon, Feb 04, 2019 at 02:44:37PM -0800, Stephane Eranian wrote:
> > Jiri,
> >
> > While you're looking at the output format, I think it would be good
> > time to simplify the code handling perf.data file.
> > Today, perf record can emit in two formats: file mode or pipe mode.
> > This adds complexity in the code and
> > is error prone as the file mode path is tested more than the pipe mode
> > path. We have run into multiple issues with
> > the pipe mode in recent years. There is no real reason why we need to
> > maintain two formats. If I recall, the pipe format
> > was introduced because on pipes you cannot lseek to update the headers
> > and therefore some of the information present as tables
> > updated on the fly needed to be generated as pseudo records by the
> > tool. I believe that the pipe format covers all the needs and could
> > supersede the file mode format. That would simplify code in perf
> > record and eliminate the risk of errors when new headers
> > are introduced.
>
> yep, I think we have almost all the features covered for pipe mode,
> and we have all necessary events to describe events features
>
> so with some effort we could switch off the superfluos file header
> and use only events to describe events ;-) make sense, I'll check
> on it

so following features are not synthesized:

FEAT_OPN(TRACING_DATA, tracing_data, false),
FEAT_OPN(BUILD_ID, build_id, false),
FEAT_OPN(BRANCH_STACK, branch_stack, false),
FEAT_OPN(AUXTRACE, auxtrace, false),
FEAT_OPN(STAT, stat, false),
FEAT_OPN(CACHE, cache, true),

I think all could be added and worked around with exception
of BUILD_ID, which we store at the end (after processing
all data) and we need it early in the report phase

maybe it's time to re-think that buildid -> mmap event
association again, because it's pain in current implementation
as well

looks like bpf code is actualy getting build ids and storing
it for the callchains in kernel.. we can check if we can do
something similar for mmap events

jirka

2019-02-11 20:39:15

by Stephane Eranian

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

Jiri,

On Mon, Feb 11, 2019 at 2:20 AM Jiri Olsa <[email protected]> wrote:
>
> On Tue, Feb 05, 2019 at 02:37:27PM +0100, Jiri Olsa wrote:
> > On Mon, Feb 04, 2019 at 02:44:37PM -0800, Stephane Eranian wrote:
> > > Jiri,
> > >
> > > While you're looking at the output format, I think it would be good
> > > time to simplify the code handling perf.data file.
> > > Today, perf record can emit in two formats: file mode or pipe mode.
> > > This adds complexity in the code and
> > > is error prone as the file mode path is tested more than the pipe mode
> > > path. We have run into multiple issues with
> > > the pipe mode in recent years. There is no real reason why we need to
> > > maintain two formats. If I recall, the pipe format
> > > was introduced because on pipes you cannot lseek to update the headers
> > > and therefore some of the information present as tables
> > > updated on the fly needed to be generated as pseudo records by the
> > > tool. I believe that the pipe format covers all the needs and could
> > > supersede the file mode format. That would simplify code in perf
> > > record and eliminate the risk of errors when new headers
> > > are introduced.
> >
> > yep, I think we have almost all the features covered for pipe mode,
> > and we have all necessary events to describe events features
> >
> > so with some effort we could switch off the superfluos file header
> > and use only events to describe events ;-) make sense, I'll check
> > on it
>
> so following features are not synthesized:
>
> FEAT_OPN(TRACING_DATA, tracing_data, false),
> FEAT_OPN(BUILD_ID, build_id, false),
> FEAT_OPN(BRANCH_STACK, branch_stack, false),
> FEAT_OPN(AUXTRACE, auxtrace, false),
> FEAT_OPN(STAT, stat, false),
> FEAT_OPN(CACHE, cache, true),
>
What do you need for BRANCH_STACK?

> I think all could be added and worked around with exception
> of BUILD_ID, which we store at the end (after processing
> all data) and we need it early in the report phase
>
Buildids are injected after the fact via perf inject when in pipe mode.

> maybe it's time to re-think that buildid -> mmap event
> association again, because it's pain in current implementation
> as well
>
Sure, but what do you propose?

> looks like bpf code is actualy getting build ids and storing
> it for the callchains in kernel.. we can check if we can do
> something similar for mmap events
>
> jirka

2019-02-11 20:39:25

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

Em Mon, Feb 11, 2019 at 10:34:16AM -0800, Stephane Eranian escreveu:
> Jiri,
>
> On Mon, Feb 11, 2019 at 2:20 AM Jiri Olsa <[email protected]> wrote:
> >
> > On Tue, Feb 05, 2019 at 02:37:27PM +0100, Jiri Olsa wrote:
> > > On Mon, Feb 04, 2019 at 02:44:37PM -0800, Stephane Eranian wrote:
> > > > Jiri,
> > > >
> > > > While you're looking at the output format, I think it would be good
> > > > time to simplify the code handling perf.data file.
> > > > Today, perf record can emit in two formats: file mode or pipe mode.
> > > > This adds complexity in the code and
> > > > is error prone as the file mode path is tested more than the pipe mode
> > > > path. We have run into multiple issues with
> > > > the pipe mode in recent years. There is no real reason why we need to
> > > > maintain two formats. If I recall, the pipe format
> > > > was introduced because on pipes you cannot lseek to update the headers
> > > > and therefore some of the information present as tables
> > > > updated on the fly needed to be generated as pseudo records by the
> > > > tool. I believe that the pipe format covers all the needs and could
> > > > supersede the file mode format. That would simplify code in perf
> > > > record and eliminate the risk of errors when new headers
> > > > are introduced.
> > >
> > > yep, I think we have almost all the features covered for pipe mode,
> > > and we have all necessary events to describe events features
> > >
> > > so with some effort we could switch off the superfluos file header
> > > and use only events to describe events ;-) make sense, I'll check
> > > on it
> >
> > so following features are not synthesized:
> >
> > FEAT_OPN(TRACING_DATA, tracing_data, false),
> > FEAT_OPN(BUILD_ID, build_id, false),
> > FEAT_OPN(BRANCH_STACK, branch_stack, false),
> > FEAT_OPN(AUXTRACE, auxtrace, false),
> > FEAT_OPN(STAT, stat, false),
> > FEAT_OPN(CACHE, cache, true),
> >
> What do you need for BRANCH_STACK?
>
> > I think all could be added and worked around with exception
> > of BUILD_ID, which we store at the end (after processing
> > all data) and we need it early in the report phase
> >
> Buildids are injected after the fact via perf inject when in pipe mode.
>
> > maybe it's time to re-think that buildid -> mmap event
> > association again, because it's pain in current implementation
> > as well
> >
> Sure, but what do you propose?

this keeps resurfacing, the idea is to have the building go together
with the PERF_RECORD_MMAP3 event, i.e. as part of setting up an
executable mapping the loader would get the buildid and ask the kernel
to keep it aroung, then when a PERF_RECORD_MMAP needs to be issued, it
can include the build id, so tooling will not need to get it.

Alternatively, we would have a separate thread to process
PERF_RECORD_MMAP events, and as soon as it gets one from the kernel,
augment it straight away with the build-id it reads from the ELF file,
i.e. no need to have the kernel provide it, do it just like we do with
PERF_RECORD_BPF_EVENT, which reminds me Song probably already posted
thise bits...

> > looks like bpf code is actualy getting build ids and storing
> > it for the callchains in kernel.. we can check if we can do
> > something similar for mmap events
> >
> > jirka

--

- Arnaldo

2019-02-11 20:40:20

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

On Mon, Feb 11, 2019 at 10:34:16AM -0800, Stephane Eranian wrote:
> Jiri,
>
> On Mon, Feb 11, 2019 at 2:20 AM Jiri Olsa <[email protected]> wrote:
> >
> > On Tue, Feb 05, 2019 at 02:37:27PM +0100, Jiri Olsa wrote:
> > > On Mon, Feb 04, 2019 at 02:44:37PM -0800, Stephane Eranian wrote:
> > > > Jiri,
> > > >
> > > > While you're looking at the output format, I think it would be good
> > > > time to simplify the code handling perf.data file.
> > > > Today, perf record can emit in two formats: file mode or pipe mode.
> > > > This adds complexity in the code and
> > > > is error prone as the file mode path is tested more than the pipe mode
> > > > path. We have run into multiple issues with
> > > > the pipe mode in recent years. There is no real reason why we need to
> > > > maintain two formats. If I recall, the pipe format
> > > > was introduced because on pipes you cannot lseek to update the headers
> > > > and therefore some of the information present as tables
> > > > updated on the fly needed to be generated as pseudo records by the
> > > > tool. I believe that the pipe format covers all the needs and could
> > > > supersede the file mode format. That would simplify code in perf
> > > > record and eliminate the risk of errors when new headers
> > > > are introduced.
> > >
> > > yep, I think we have almost all the features covered for pipe mode,
> > > and we have all necessary events to describe events features
> > >
> > > so with some effort we could switch off the superfluos file header
> > > and use only events to describe events ;-) make sense, I'll check
> > > on it
> >
> > so following features are not synthesized:
> >
> > FEAT_OPN(TRACING_DATA, tracing_data, false),
> > FEAT_OPN(BUILD_ID, build_id, false),
> > FEAT_OPN(BRANCH_STACK, branch_stack, false),
> > FEAT_OPN(AUXTRACE, auxtrace, false),
> > FEAT_OPN(STAT, stat, false),
> > FEAT_OPN(CACHE, cache, true),
> >
> What do you need for BRANCH_STACK?

nothing, I think it's just the flag

>
> > I think all could be added and worked around with exception
> > of BUILD_ID, which we store at the end (after processing
> > all data) and we need it early in the report phase
> >
> Buildids are injected after the fact via perf inject when in pipe mode.
>
> > maybe it's time to re-think that buildid -> mmap event
> > association again, because it's pain in current implementation
> > as well
> >
> Sure, but what do you propose?
>

this:

> > looks like bpf code is actualy getting build ids and storing
> > it for the callchains in kernel.. we can check if we can do
> > something similar for mmap events
> >
> > jirka

jirka

2019-02-11 20:43:01

by Stephane Eranian

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

Arnaldo,

On Mon, Feb 11, 2019 at 10:55 AM Arnaldo Carvalho de Melo
<[email protected]> wrote:
>
> Em Mon, Feb 11, 2019 at 10:34:16AM -0800, Stephane Eranian escreveu:
> > Jiri,
> >
> > On Mon, Feb 11, 2019 at 2:20 AM Jiri Olsa <[email protected]> wrote:
> > >
> > > On Tue, Feb 05, 2019 at 02:37:27PM +0100, Jiri Olsa wrote:
> > > > On Mon, Feb 04, 2019 at 02:44:37PM -0800, Stephane Eranian wrote:
> > > > > Jiri,
> > > > >
> > > > > While you're looking at the output format, I think it would be good
> > > > > time to simplify the code handling perf.data file.
> > > > > Today, perf record can emit in two formats: file mode or pipe mode.
> > > > > This adds complexity in the code and
> > > > > is error prone as the file mode path is tested more than the pipe mode
> > > > > path. We have run into multiple issues with
> > > > > the pipe mode in recent years. There is no real reason why we need to
> > > > > maintain two formats. If I recall, the pipe format
> > > > > was introduced because on pipes you cannot lseek to update the headers
> > > > > and therefore some of the information present as tables
> > > > > updated on the fly needed to be generated as pseudo records by the
> > > > > tool. I believe that the pipe format covers all the needs and could
> > > > > supersede the file mode format. That would simplify code in perf
> > > > > record and eliminate the risk of errors when new headers
> > > > > are introduced.
> > > >
> > > > yep, I think we have almost all the features covered for pipe mode,
> > > > and we have all necessary events to describe events features
> > > >
> > > > so with some effort we could switch off the superfluos file header
> > > > and use only events to describe events ;-) make sense, I'll check
> > > > on it
> > >
> > > so following features are not synthesized:
> > >
> > > FEAT_OPN(TRACING_DATA, tracing_data, false),
> > > FEAT_OPN(BUILD_ID, build_id, false),
> > > FEAT_OPN(BRANCH_STACK, branch_stack, false),
> > > FEAT_OPN(AUXTRACE, auxtrace, false),
> > > FEAT_OPN(STAT, stat, false),
> > > FEAT_OPN(CACHE, cache, true),
> > >
> > What do you need for BRANCH_STACK?
> >
> > > I think all could be added and worked around with exception
> > > of BUILD_ID, which we store at the end (after processing
> > > all data) and we need it early in the report phase
> > >
> > Buildids are injected after the fact via perf inject when in pipe mode.
> >
> > > maybe it's time to re-think that buildid -> mmap event
> > > association again, because it's pain in current implementation
> > > as well
> > >
> > Sure, but what do you propose?
>
> this keeps resurfacing, the idea is to have the building go together
> with the PERF_RECORD_MMAP3 event, i.e. as part of setting up an
> executable mapping the loader would get the buildid and ask the kernel
> to keep it aroung, then when a PERF_RECORD_MMAP needs to be issued, it
> can include the build id, so tooling will not need to get it.
>
And how would the dynamic loader (ld.so) communicate the buildid to the kernel?
How would that work for statically linked binaries.
I think you're say the kernel would parse the ELF header looking for
that note section
and extract the buildid from there. Is that what you are proposing?

> Alternatively, we would have a separate thread to process
> PERF_RECORD_MMAP events, and as soon as it gets one from the kernel,
> augment it straight away with the build-id it reads from the ELF file,
> i.e. no need to have the kernel provide it, do it just like we do with
> PERF_RECORD_BPF_EVENT, which reminds me Song probably already posted
> thise bits...
>
But that would not work in pipe mode, wouldn't it?
Unless that thread intercepts everything pushed to the pipe looking
for MMAP records.

> > > looks like bpf code is actualy getting build ids and storing
> > > it for the callchains in kernel.. we can check if we can do
> > > something similar for mmap events
> > >
> > > jirka
>
> --
>
> - Arnaldo

2019-02-11 20:44:20

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

Em Mon, Feb 11, 2019 at 07:53:06PM +0100, Jiri Olsa escreveu:
> On Mon, Feb 11, 2019 at 10:34:16AM -0800, Stephane Eranian wrote:
> > On Mon, Feb 11, 2019 at 2:20 AM Jiri Olsa <[email protected]> wrote:
> > > On Tue, Feb 05, 2019 at 02:37:27PM +0100, Jiri Olsa wrote:
> > > I think all could be added and worked around with exception
> > > of BUILD_ID, which we store at the end (after processing
> > > all data) and we need it early in the report phase

> > Buildids are injected after the fact via perf inject when in pipe mode.

> > > maybe it's time to re-think that buildid -> mmap event
> > > association again, because it's pain in current implementation
> > > as well

> > Sure, but what do you propose?

> this:
>
> > > looks like bpf code is actualy getting build ids and storing
> > > it for the callchains in kernel.. we can check if we can do
> > > something similar for mmap events

kernel/bpf/stackmap.c

/* Parse build ID from 64-bit ELF */
static int stack_map_get_build_id_64(void *page_addr,
unsigned char *build_id)

yeah, wasn't aware of that, good thing doing backports, huh? :-)

So do you thing about having a PERF_SAMPLE_BUILDID in sample_type and go
and stash that thing in PERF_RECORD_MMAP2?

- Arnaldo

2019-02-11 20:45:44

by Song Liu

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory



> On Feb 11, 2019, at 11:30 AM, Stephane Eranian <[email protected]> wrote:
>
> Arnaldo,
>
> On Mon, Feb 11, 2019 at 10:55 AM Arnaldo Carvalho de Melo
> <[email protected]> wrote:
>>
>> Em Mon, Feb 11, 2019 at 10:34:16AM -0800, Stephane Eranian escreveu:
>>> Jiri,
>>>
>>> On Mon, Feb 11, 2019 at 2:20 AM Jiri Olsa <[email protected]> wrote:
>>>>
>>>> On Tue, Feb 05, 2019 at 02:37:27PM +0100, Jiri Olsa wrote:
>>>>> On Mon, Feb 04, 2019 at 02:44:37PM -0800, Stephane Eranian wrote:
>>>>>> Jiri,
>>>>>>
>>>>>> While you're looking at the output format, I think it would be good
>>>>>> time to simplify the code handling perf.data file.
>>>>>> Today, perf record can emit in two formats: file mode or pipe mode.
>>>>>> This adds complexity in the code and
>>>>>> is error prone as the file mode path is tested more than the pipe mode
>>>>>> path. We have run into multiple issues with
>>>>>> the pipe mode in recent years. There is no real reason why we need to
>>>>>> maintain two formats. If I recall, the pipe format
>>>>>> was introduced because on pipes you cannot lseek to update the headers
>>>>>> and therefore some of the information present as tables
>>>>>> updated on the fly needed to be generated as pseudo records by the
>>>>>> tool. I believe that the pipe format covers all the needs and could
>>>>>> supersede the file mode format. That would simplify code in perf
>>>>>> record and eliminate the risk of errors when new headers
>>>>>> are introduced.
>>>>>
>>>>> yep, I think we have almost all the features covered for pipe mode,
>>>>> and we have all necessary events to describe events features
>>>>>
>>>>> so with some effort we could switch off the superfluos file header
>>>>> and use only events to describe events ;-) make sense, I'll check
>>>>> on it
>>>>
>>>> so following features are not synthesized:
>>>>
>>>> FEAT_OPN(TRACING_DATA, tracing_data, false),
>>>> FEAT_OPN(BUILD_ID, build_id, false),
>>>> FEAT_OPN(BRANCH_STACK, branch_stack, false),
>>>> FEAT_OPN(AUXTRACE, auxtrace, false),
>>>> FEAT_OPN(STAT, stat, false),
>>>> FEAT_OPN(CACHE, cache, true),
>>>>
>>> What do you need for BRANCH_STACK?
>>>
>>>> I think all could be added and worked around with exception
>>>> of BUILD_ID, which we store at the end (after processing
>>>> all data) and we need it early in the report phase
>>>>
>>> Buildids are injected after the fact via perf inject when in pipe mode.
>>>
>>>> maybe it's time to re-think that buildid -> mmap event
>>>> association again, because it's pain in current implementation
>>>> as well
>>>>
>>> Sure, but what do you propose?
>>
>> this keeps resurfacing, the idea is to have the building go together
>> with the PERF_RECORD_MMAP3 event, i.e. as part of setting up an
>> executable mapping the loader would get the buildid and ask the kernel
>> to keep it aroung, then when a PERF_RECORD_MMAP needs to be issued, it
>> can include the build id, so tooling will not need to get it.
>>
> And how would the dynamic loader (ld.so) communicate the buildid to the kernel?
> How would that work for statically linked binaries.
> I think you're say the kernel would parse the ELF header looking for
> that note section
> and extract the buildid from there. Is that what you are proposing?

We have kernel parses ELF header for BUILD-ID in BPF side. You can
find the code in stack_map_get_build_id_offset() and functions called
by it.

>
>> Alternatively, we would have a separate thread to process
>> PERF_RECORD_MMAP events, and as soon as it gets one from the kernel,
>> augment it straight away with the build-id it reads from the ELF file,
>> i.e. no need to have the kernel provide it, do it just like we do with
>> PERF_RECORD_BPF_EVENT, which reminds me Song probably already posted
>> thise bits...
>>
> But that would not work in pipe mode, wouldn't it?
> Unless that thread intercepts everything pushed to the pipe looking
> for MMAP records.

For PERF_RECORD_BPF_EVENT, I am adding a separate thread, which only
listen to PERF_RECORD_BPF_EVENT with watermark of 1. This means,
each PERF_RECORD_BPF_EVENT is sent to two ring buffers. One of them
got written to the pipe, the other is only processed by the listening
thread. Please see https://patchwork.ozlabs.org/patch/1039091/ for
details.

Thanks,
Song

>
>>>> looks like bpf code is actualy getting build ids and storing
>>>> it for the callchains in kernel.. we can check if we can do
>>>> something similar for mmap events
>>>>
>>>> jirka
>>
>> --
>>
>> - Arnaldo


2019-02-11 20:46:10

by Stephane Eranian

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

On Mon, Feb 11, 2019 at 12:18 PM Jiri Olsa <[email protected]> wrote:
>
> On Mon, Feb 11, 2019 at 04:32:02PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Mon, Feb 11, 2019 at 07:53:06PM +0100, Jiri Olsa escreveu:
> > > On Mon, Feb 11, 2019 at 10:34:16AM -0800, Stephane Eranian wrote:
> > > > On Mon, Feb 11, 2019 at 2:20 AM Jiri Olsa <[email protected]> wrote:
> > > > > On Tue, Feb 05, 2019 at 02:37:27PM +0100, Jiri Olsa wrote:
> > > > > I think all could be added and worked around with exception
> > > > > of BUILD_ID, which we store at the end (after processing
> > > > > all data) and we need it early in the report phase
> >
> > > > Buildids are injected after the fact via perf inject when in pipe mode.
> >
> > > > > maybe it's time to re-think that buildid -> mmap event
> > > > > association again, because it's pain in current implementation
> > > > > as well
> >
> > > > Sure, but what do you propose?
> >
> > > this:
> > >
> > > > > looks like bpf code is actualy getting build ids and storing
> > > > > it for the callchains in kernel.. we can check if we can do
> > > > > something similar for mmap events
> >
> > kernel/bpf/stackmap.c
> >
> > /* Parse build ID from 64-bit ELF */
> > static int stack_map_get_build_id_64(void *page_addr,
> > unsigned char *build_id)
> >
> > yeah, wasn't aware of that, good thing doing backports, huh? :-)
> >
> > So do you thing about having a PERF_SAMPLE_BUILDID in sample_type and go
> > and stash that thing in PERF_RECORD_MMAP2?
>
That would be special processing. Normally the sample_type go into
each RECORD_SAMPLE.
So I would not recommend this approach.

> I thought having new MMAP3 event version with buildid field in it
> if available and/or enabled by bit in perf_event_attr
>
I think MMAP3 is a cleaner approach, though it adds yet another MMAP event.

> jirka

2019-02-11 20:46:16

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

On Mon, Feb 11, 2019 at 04:32:02PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Feb 11, 2019 at 07:53:06PM +0100, Jiri Olsa escreveu:
> > On Mon, Feb 11, 2019 at 10:34:16AM -0800, Stephane Eranian wrote:
> > > On Mon, Feb 11, 2019 at 2:20 AM Jiri Olsa <[email protected]> wrote:
> > > > On Tue, Feb 05, 2019 at 02:37:27PM +0100, Jiri Olsa wrote:
> > > > I think all could be added and worked around with exception
> > > > of BUILD_ID, which we store at the end (after processing
> > > > all data) and we need it early in the report phase
>
> > > Buildids are injected after the fact via perf inject when in pipe mode.
>
> > > > maybe it's time to re-think that buildid -> mmap event
> > > > association again, because it's pain in current implementation
> > > > as well
>
> > > Sure, but what do you propose?
>
> > this:
> >
> > > > looks like bpf code is actualy getting build ids and storing
> > > > it for the callchains in kernel.. we can check if we can do
> > > > something similar for mmap events
>
> kernel/bpf/stackmap.c
>
> /* Parse build ID from 64-bit ELF */
> static int stack_map_get_build_id_64(void *page_addr,
> unsigned char *build_id)
>
> yeah, wasn't aware of that, good thing doing backports, huh? :-)
>
> So do you thing about having a PERF_SAMPLE_BUILDID in sample_type and go
> and stash that thing in PERF_RECORD_MMAP2?

I thought having new MMAP3 event version with buildid field in it
if available and/or enabled by bit in perf_event_attr

jirka

2019-02-14 18:42:31

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

On Mon, Feb 11, 2019 at 12:43:22PM -0800, Stephane Eranian wrote:
> On Mon, Feb 11, 2019 at 12:18 PM Jiri Olsa <[email protected]> wrote:
> >
> > On Mon, Feb 11, 2019 at 04:32:02PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Mon, Feb 11, 2019 at 07:53:06PM +0100, Jiri Olsa escreveu:
> > > > On Mon, Feb 11, 2019 at 10:34:16AM -0800, Stephane Eranian wrote:
> > > > > On Mon, Feb 11, 2019 at 2:20 AM Jiri Olsa <[email protected]> wrote:
> > > > > > On Tue, Feb 05, 2019 at 02:37:27PM +0100, Jiri Olsa wrote:
> > > > > > I think all could be added and worked around with exception
> > > > > > of BUILD_ID, which we store at the end (after processing
> > > > > > all data) and we need it early in the report phase
> > >
> > > > > Buildids are injected after the fact via perf inject when in pipe mode.
> > >
> > > > > > maybe it's time to re-think that buildid -> mmap event
> > > > > > association again, because it's pain in current implementation
> > > > > > as well
> > >
> > > > > Sure, but what do you propose?
> > >
> > > > this:
> > > >
> > > > > > looks like bpf code is actualy getting build ids and storing
> > > > > > it for the callchains in kernel.. we can check if we can do
> > > > > > something similar for mmap events
> > >
> > > kernel/bpf/stackmap.c
> > >
> > > /* Parse build ID from 64-bit ELF */
> > > static int stack_map_get_build_id_64(void *page_addr,
> > > unsigned char *build_id)
> > >
> > > yeah, wasn't aware of that, good thing doing backports, huh? :-)
> > >
> > > So do you thing about having a PERF_SAMPLE_BUILDID in sample_type and go
> > > and stash that thing in PERF_RECORD_MMAP2?
> >
> That would be special processing. Normally the sample_type go into
> each RECORD_SAMPLE.
> So I would not recommend this approach.
>
> > I thought having new MMAP3 event version with buildid field in it
> > if available and/or enabled by bit in perf_event_attr
> >
> I think MMAP3 is a cleaner approach, though it adds yet another MMAP event.

actually I realized this might not help at the end at all

what we do now is that we scan the data (after it's recorded)
to get the list of binaries that got touched during the profile
and store those in .debug cache based on the build ids

we can't just store ALL of the binaries that the session comes
across via mmap events

I can see the build id in mmap helping to resolve the race
issue when some binary change during the profile session and
we have no idea and report on wrong one.. but I dont see
people complaining about this at all

jirka

2019-02-14 23:20:26

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

Em Thu, Feb 14, 2019 at 12:34:50PM +0100, Jiri Olsa escreveu:
> On Mon, Feb 11, 2019 at 12:43:22PM -0800, Stephane Eranian wrote:
> > On Mon, Feb 11, 2019 at 12:18 PM Jiri Olsa <[email protected]> wrote:
> > > I thought having new MMAP3 event version with buildid field in it
> > > if available and/or enabled by bit in perf_event_attr

> > I think MMAP3 is a cleaner approach, though it adds yet another MMAP event.

> actually I realized this might not help at the end at all

> what we do now is that we scan the data (after it's recorded)

We wouldn't have to do that scan of the data at the end if we have the
build-id in the MMAP3 records.

> to get the list of binaries that got touched during the profile
> and store those in .debug cache based on the build ids

> we can't just store ALL of the binaries that the session comes
> across via mmap events

We don't have to, we never did that at 'perf record' time, this is
something for 'perf archive' to do, see below.

> I can see the build id in mmap helping to resolve the race
> issue when some binary change during the profile session and
> we have no idea and report on wrong one.. but I dont see
> people complaining about this at all

But with the build id in the MMAPs we wouldn't need to scan anything at
'perf record' time, just at 'perf archive' time, where we would scan
everything, and as soon as we find a hit, we add that DSO to the list of
things we need to put in the tarball.

- Arnaldo

2019-02-14 23:28:18

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

On Thu, Feb 14, 2019 at 09:57:59AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Feb 14, 2019 at 12:34:50PM +0100, Jiri Olsa escreveu:
> > On Mon, Feb 11, 2019 at 12:43:22PM -0800, Stephane Eranian wrote:
> > > On Mon, Feb 11, 2019 at 12:18 PM Jiri Olsa <[email protected]> wrote:
> > > > I thought having new MMAP3 event version with buildid field in it
> > > > if available and/or enabled by bit in perf_event_attr
>
> > > I think MMAP3 is a cleaner approach, though it adds yet another MMAP event.
>
> > actually I realized this might not help at the end at all
>
> > what we do now is that we scan the data (after it's recorded)
>
> We wouldn't have to do that scan of the data at the end if we have the
> build-id in the MMAP3 records.
>
> > to get the list of binaries that got touched during the profile
> > and store those in .debug cache based on the build ids
>
> > we can't just store ALL of the binaries that the session comes
> > across via mmap events
>
> We don't have to, we never did that at 'perf record' time, this is
> something for 'perf archive' to do, see below.
>
> > I can see the build id in mmap helping to resolve the race
> > issue when some binary change during the profile session and
> > we have no idea and report on wrong one.. but I dont see
> > people complaining about this at all
>
> But with the build id in the MMAPs we wouldn't need to scan anything at
> 'perf record' time, just at 'perf archive' time, where we would scan
> everything, and as soon as we find a hit, we add that DSO to the list of
> things we need to put in the tarball.

ok.. it might little change the expected behavour in that
you will not have .debug cache populated until you run
perf archive.. some profile data might stop report after
you reinstall the binary..

on the other hande '.debug' cache would stop growing
uncontrolably.. so I think I'd be ok with this

jirka

2019-02-14 23:49:29

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

Em Thu, Feb 14, 2019 at 02:26:38PM +0100, Jiri Olsa escreveu:
> On Thu, Feb 14, 2019 at 09:57:59AM -0300, Arnaldo Carvalho de Melo wrote:
> > But with the build id in the MMAPs we wouldn't need to scan anything at
> > 'perf record' time, just at 'perf archive' time, where we would scan
> > everything, and as soon as we find a hit, we add that DSO to the list of
> > things we need to put in the tarball.

> ok.. it might little change the expected behavour in that you will not
> have .debug cache populated until you run perf archive.. some profile
> data might stop report after you reinstall the binary..

Well, nothing prevents us from continuing to, in 'perf record', go thru
the perf.data just collected to populate the .debug cache, its just that
we would do it just for that, for populating the cache, we wouldn't
_have_ to do that for collecting the buildids.

> on the other hand '.debug' cache would stop growing uncontrolably..
> so I think I'd be ok with this

That is another problem, and one that has to be solved in a way similar
to ccache's --max-size option.

The .debug cache is useful for various workflows, but may get in the way
for some others, which is why we have ways to disable it.

For instance, when working on some project it is handy to have copies of
binaries saved so that older perf.data files can find a file that was
rewritten by the compiler when doing changes in it, etc.

With the .debug cache and if using -g, we can get the older copy of the
binary _and_ its sources, annotation for older versions continue to
work, etc.

- Arnaldo

2019-02-15 02:15:47

by Stephane Eranian

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

On Thu, Feb 14, 2019 at 6:00 AM Arnaldo Carvalho de Melo
<[email protected]> wrote:
>
> Em Thu, Feb 14, 2019 at 02:26:38PM +0100, Jiri Olsa escreveu:
> > On Thu, Feb 14, 2019 at 09:57:59AM -0300, Arnaldo Carvalho de Melo wrote:
> > > But with the build id in the MMAPs we wouldn't need to scan anything at
> > > 'perf record' time, just at 'perf archive' time, where we would scan
> > > everything, and as soon as we find a hit, we add that DSO to the list of
> > > things we need to put in the tarball.
>
> > ok.. it might little change the expected behavour in that you will not
> > have .debug cache populated until you run perf archive.. some profile
> > data might stop report after you reinstall the binary..
>
Today perf record does collect the buildids once monitor is completed,
it does one
pass over the perf.data file looking for MMAP records, or at least in
the version I am
more familiar with.

> Well, nothing prevents us from continuing to, in 'perf record', go thru
> the perf.data just collected to populate the .debug cache, its just that
> we would do it just for that, for populating the cache, we wouldn't
> _have_ to do that for collecting the buildids.

Sure, for compatibility reasons.
In pipe mode, this would also avoid the need for perf inject -b -i -,
which is a win.

perf archive is useful if you do not have a way to locate the binary using only
the buildid on the analysis machine.
>
> > on the other hand '.debug' cache would stop growing uncontrolably..
> > so I think I'd be ok with this
>
I agree!

> That is another problem, and one that has to be solved in a way similar
> to ccache's --max-size option.
>
> The .debug cache is useful for various workflows, but may get in the way
> for some others, which is why we have ways to disable it.
>
Correct.

> For instance, when working on some project it is handy to have copies of
> binaries saved so that older perf.data files can find a file that was
> rewritten by the compiler when doing changes in it, etc.
>
> With the .debug cache and if using -g, we can get the older copy of the
> binary _and_ its sources, annotation for older versions continue to
> work, etc.
>
> - Arnaldo

2019-02-15 02:17:24

by Stephane Eranian

[permalink] [raw]
Subject: Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

On Thu, Feb 14, 2019 at 1:35 PM Arnaldo Carvalho de Melo
<[email protected]> wrote:
>
> On Thu, Feb 14, 2019, 6:31 PM Stephane Eranian <[email protected] wrote:
>>
>> On Thu, Feb 14, 2019 at 6:00 AM Arnaldo Carvalho de Melo
>> <[email protected]> wrote:
>> >
>> > Em Thu, Feb 14, 2019 at 02:26:38PM +0100, Jiri Olsa escreveu:
>> > > On Thu, Feb 14, 2019 at 09:57:59AM -0300, Arnaldo Carvalho de Melo wrote:
>> > > > But with the build id in the MMAPs we wouldn't need to scan anything at
>> > > > 'perf record' time, just at 'perf archive' time, where we would scan
>> > > > everything, and as soon as we find a hit, we add that DSO to the list of
>> > > > things we need to put in the tarball.
>> >
>> > > ok.. it might little change the expected behavour in that you will not
>> > > have .debug cache populated until you run perf archive.. some profile
>> > > data might stop report after you reinstall the binary..
>> >
>> Today perf record does collect the buildids once monitor is completed,
>> it does one
>> pass over the perf.data file looking for MMAP records, or at least in
>> the version I am
>> more familiar with.
>
>
> It does look for the MMAP records, how else would it find the DSO pathnames?
>
I was not talking about the synthesize_mmap() code. I was talking
about process_buildids().

>
> - Arnaldo
>
> Sent from smartphone
>>
>>
>> > Well, nothing prevents us from continuing to, in 'perf record', go thru
>> > the perf.data just collected to populate the .debug cache, its just that
>> > we would do it just for that, for populating the cache, we wouldn't
>> > _have_ to do that for collecting the buildids.
>>
>> Sure, for compatibility reasons.
>> In pipe mode, this would also avoid the need for perf inject -b -i -,
>> which is a win.
>>
>> perf archive is useful if you do not have a way to locate the binary using only
>> the buildid on the analysis machine.
>> >
>> > > on the other hand '.debug' cache would stop growing uncontrolably..
>> > > so I think I'd be ok with this
>> >
>> I agree!
>>
>> > That is another problem, and one that has to be solved in a way similar
>> > to ccache's --max-size option.
>> >
>> > The .debug cache is useful for various workflows, but may get in the way
>> > for some others, which is why we have ways to disable it.
>> >
>> Correct.
>>
>> > For instance, when working on some project it is handy to have copies of
>> > binaries saved so that older perf.data files can find a file that was
>> > rewritten by the compiler when doing changes in it, etc.
>> >
>> > With the .debug cache and if using -g, we can get the older copy of the
>> > binary _and_ its sources, annotation for older versions continue to
>> > work, etc.
>> >
>> > - Arnaldo