2013-07-30 09:19:20

by Namhyung Kim

[permalink] [raw]
Subject: [PATCHSET 00/17] perf tools: Introduce new 'ftrace' command (v4)

This patchset implements a front-end tool for kernel's ftrace. It
uses function_graph tracer by default and normal function tracer is
also supported. (Of course you need to enable those tracers in your
kernel first.)

v4 changes:
* use pid instead of tid for -p option (David)
* not to poll() for reading ftrace pipes (Jiri)
* rename to create_perf_header() (Jiri)
* canonicalize directory name (Jiri)
* show more info when -D option was given (Jiri)
* update documentation (Jiri)
* and few more bug fixes

I pushed it out to 'perf/ftrace-v4' branch on my tree at:

git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git


Any comments are welcome, thanks,
Namhyung


Namhyung Kim (17):
perf util: Save pid-cmdline mapping into tracing header
perf util: Add more debug message on failure path
perf tools: Introduce new 'ftrace' tool
perf ftrace: Add support for --pid option
perf ftrace: Add support for -a and -C option
perf ftrace: Split "live" sub-command
perf ftrace: Add 'record' sub-command
perf ftrace: Add 'show' sub-command
perf ftrace: Add 'report' sub-command
perf ftrace: Add dump_printf() for low-level debugging
perf ftrace: Use pager for displaying result
perf ftrace: Cleanup using ftrace_setup/teardown()
perf tools: Add document for perf-ftrace command
perf ftrace: Add a signal handler for SIGSEGV
perf ftrace: Add --clock option
perf ftrace: Show leaf-functions as oneliner
perf ftrace: Tidy up the function graph output of 'show' subcommand

tools/perf/Documentation/perf-ftrace.txt | 116 ++
tools/perf/Makefile | 1 +
tools/perf/builtin-ftrace.c | 1788 ++++++++++++++++++++++++++++++
tools/perf/builtin.h | 1 +
tools/perf/command-list.txt | 1 +
tools/perf/perf.c | 1 +
tools/perf/util/cpumap.c | 45 +
tools/perf/util/cpumap.h | 1 +
tools/perf/util/debug.c | 8 +-
tools/perf/util/debug.h | 2 +-
tools/perf/util/header.c | 4 +-
tools/perf/util/session.c | 2 +-
tools/perf/util/trace-event-info.c | 33 +-
tools/perf/util/trace-event-parse.c | 17 +
tools/perf/util/trace-event-read.c | 77 +-
tools/perf/util/trace-event.h | 1 +
16 files changed, 2077 insertions(+), 21 deletions(-)
create mode 100644 tools/perf/Documentation/perf-ftrace.txt
create mode 100644 tools/perf/builtin-ftrace.c


Following is the original description and example.
-----
It consists of 4 subcommands: live, record, show and report.

'perf ftrace live' just triggers ftrace and relay kernel buffer
contents to stdout. It does no processing in the tool side.

'perf ftrace record' starts ftrace and saves its result to per-cpu
files and a perf.header file in the perf.data.dir directory.
Recording was done by multiple threads (a thread per cpu) in order not
to miss events overrun. The perf.header file is compatible to current
perf.data file and contains useful information and sample data.

The sample data were synthesized for each recorded cpu to provide more
information - I'm not sure it's really needed though.

Once you had run 'perf ftrace record', you could play with other
subcommands.

'perf ftrace show' displays function traces like 'live' subcommand or
trace-cmd does. It's not useful than them at this time but it could
be improved soon.

'perf ftrace report' displays usual 'perf report' style output from
the function trace data. You can see which function is called most
frequently for example. Currently it uses 1 as a period value for
each entry but we might use funcgraph_exit->calltime to get proper
overhead later.

Example below:

# perf ftrace record sleep 0.1
# ls -l perf.data.dir
total 5568
-rw-r--r--. 1 root root 3514375 Apr 23 16:43 perf.header
-rw-r--r--. 1 root root 90112 Apr 23 16:43 trace-cpu0.buf
-rw-r--r--. 1 root root 0 Apr 23 16:43 trace-cpu1.buf
-rw-r--r--. 1 root root 2093056 Apr 23 16:43 trace-cpu2.buf
-rw-r--r--. 1 root root 0 Apr 23 16:43 trace-cpu3.buf

# perf ftrace show
overriding event (11) ftrace:funcgraph_entry with new print handler
overriding event (10) ftrace:funcgraph_exit with new print handler
0) 0.065 us | __fsnotify_parent();
0) | fsnotify() {
0) 0.060 us | __srcu_read_lock();
0) 0.040 us | __srcu_read_unlock();
0) 0.652 us | }
0) 0.040 us | fput();
0) | __audit_syscall_exit() {
0) | path_put() {
0) 0.037 us | dput();
0) 0.032 us | mntput();
0) 0.563 us | }
0) 0.035 us | unroll_tree_refs();
0) 0.035 us | kfree();
0) 1.284 us | }
0) | __audit_syscall_entry() {
0) 0.029 us | current_kernel_time();
0) 0.239 us | }
0) | sys_write() {
0) 0.155 us | fget_light();
0) | vfs_write() {
0) | rw_verify_area() {
0) | security_file_permission() {
0) | selinux_file_permission() {
0) | file_has_perm() {
0) | inode_has_perm.isra.31.constprop.61() {
0) 0.181 us | avc_has_perm_flags();
0) 0.405 us | }
0) 0.629 us | }
0) 1.024 us | }
0) 1.247 us | }
0) 1.596 us | }
0) | do_sync_write() {

# perf ftrace report
# ========
# captured on: Tue Jul 30 18:02:34 2013
# hostname : sejong.aot.lge.com
# os release : 3.9.10-100.fc17.x86_64
# perf version : 3.11.rc1.gcdd406
# arch : x86_64
# nrcpus online : 12
# nrcpus avail : 12
# cpudesc : Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz
# cpuid : GenuineIntel,6,45,7
# total memory : 32901840 kB
# cmdline : /home/namhyung/project/linux/tools/perf/perf ftrace record sleep 0.1
# event : name = ftrace:funcgraph_entry, type = 2, config = 0xb, config1 = 0x0,
# event : name = ftrace:funcgraph_exit, type = 2, config = 0xa, config1 = 0x0,
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# ========
#
# Samples: 22K of event 'ftrace:funcgraph_entry'
# Event count (approx.): 22433
#
# Overhead Command Shared Object Symbol
# ........ ....... ................. ..............................
#
7.90% sleep [kernel.kallsyms] [k] _cond_resched
4.57% sleep [kernel.kallsyms] [k] ebitmap_get_bit
3.50% sleep [kernel.kallsyms] [k] __phys_addr
3.12% sleep [kernel.kallsyms] [k] _raw_spin_lock
3.09% sleep [kernel.kallsyms] [k] kmem_cache_free
2.93% sleep [kernel.kallsyms] [k] __slab_free
2.86% sleep [kernel.kallsyms] [k] __tlb_remove_page
2.35% sleep [kernel.kallsyms] [k] vm_normal_page
2.33% sleep [kernel.kallsyms] [k] page_remove_rmap
1.87% sleep [kernel.kallsyms] [k] mutex_unlock


--
1.7.11.7


2013-07-30 09:19:23

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 02/17] perf util: Add more debug message on failure path

From: Namhyung Kim <[email protected]>

It's helpful for debugging on tracing features.

Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/header.c | 4 ++-
tools/perf/util/trace-event-read.c | 53 ++++++++++++++++++++++++++------------
2 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index f558f83769af..11b08f24ba42 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -2700,8 +2700,10 @@ static int perf_evsel__prepare_tracepoint_event(struct perf_evsel *evsel,
}

event = pevent_find_event(pevent, evsel->attr.config);
- if (event == NULL)
+ if (event == NULL) {
+ pr_debug("cannot find event format for %d\n", (int)evsel->attr.config);
return -1;
+ }

if (!evsel->name) {
snprintf(bf, sizeof(bf), "%s:%s", event->system, event->name);
diff --git a/tools/perf/util/trace-event-read.c b/tools/perf/util/trace-event-read.c
index e084e5e654b6..0e3b3f527320 100644
--- a/tools/perf/util/trace-event-read.c
+++ b/tools/perf/util/trace-event-read.c
@@ -262,39 +262,53 @@ static int read_header_files(struct pevent *pevent)

static int read_ftrace_file(struct pevent *pevent, unsigned long long size)
{
+ int ret;
char *buf;

buf = malloc(size);
- if (buf == NULL)
+ if (buf == NULL) {
+ pr_debug("memory allocation failure\n");
return -1;
+ }

- if (do_read(buf, size) < 0) {
- free(buf);
- return -1;
+ ret = do_read(buf, size);
+ if (ret < 0) {
+ pr_debug("error reading ftrace file.\n");
+ goto out;
}

- parse_ftrace_file(pevent, buf, size);
+ ret = parse_ftrace_file(pevent, buf, size);
+ if (ret < 0)
+ pr_debug("error parsing ftrace file.\n");
+out:
free(buf);
- return 0;
+ return ret;
}

static int read_event_file(struct pevent *pevent, char *sys,
unsigned long long size)
{
+ int ret;
char *buf;

buf = malloc(size);
- if (buf == NULL)
+ if (buf == NULL) {
+ pr_debug("memory allocation failure\n");
return -1;
+ }

- if (do_read(buf, size) < 0) {
+ ret = do_read(buf, size);
+ if (ret < 0) {
free(buf);
- return -1;
+ goto out;
}

- parse_event_file(pevent, buf, size, sys);
+ ret = parse_event_file(pevent, buf, size, sys);
+ if (ret < 0)
+ pr_debug("error parsing event file.\n");
+out:
free(buf);
- return 0;
+ return ret;
}

static int read_ftrace_files(struct pevent *pevent)
@@ -347,6 +361,7 @@ static int read_saved_cmdline(struct pevent *pevent)
{
unsigned long long size;
char *buf;
+ int ret;

/* it can have 0 size */
size = read8(pevent);
@@ -354,18 +369,22 @@ static int read_saved_cmdline(struct pevent *pevent)
return 0;

buf = malloc(size + 1);
- if (buf == NULL)
+ if (buf == NULL) {
+ pr_debug("memory allocation failure\n");
return -1;
+ }

- if (do_read(buf, size) < 0) {
- free(buf);
- return -1;
+ ret = do_read(buf, size);
+ if (ret < 0) {
+ pr_debug("error reading saved cmdlines\n");
+ goto out;
}

parse_saved_cmdline(pevent, buf, size);
-
+ ret = 0;
+out:
free(buf);
- return 0;
+ return ret;
}

ssize_t trace_report(int fd, struct pevent **ppevent, bool __repipe)
--
1.7.11.7

2013-07-30 09:19:34

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 10/17] perf ftrace: Add dump_printf() for low-level debugging

From: Namhyung Kim <[email protected]>

For reusability, rename trace_event() to dump_raw_event() and pass
size as an argument. And use it in do_ftrace_report() to show raw
data of ftrace entries.

Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-ftrace.c | 6 ++++++
tools/perf/util/debug.c | 8 ++++----
tools/perf/util/debug.h | 2 +-
tools/perf/util/session.c | 2 +-
4 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 0792bfdf168c..bb0a238b8509 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -1330,6 +1330,12 @@ static int do_ftrace_report(struct perf_ftrace *ftrace)
goto out;
}

+ dump_printf(" .... event: %s\n", perf_evsel__name(evsel));
+ dump_raw_event(record, record->size);
+ dump_printf("%3d %llu %#llx [%#x]\n\n",
+ record->cpu, record->ts, record->offset,
+ record->size);
+
/* TODO: update sample.period using calltime */
if (!__hists__add_entry(&evsel->hists, &al, NULL,
sample.period, 0)) {
diff --git a/tools/perf/util/debug.c b/tools/perf/util/debug.c
index 399e74c34c1a..2c42aaed4528 100644
--- a/tools/perf/util/debug.c
+++ b/tools/perf/util/debug.c
@@ -47,7 +47,7 @@ int dump_printf(const char *fmt, ...)
return ret;
}

-void trace_event(union perf_event *event)
+void dump_raw_event(void *event, int size)
{
unsigned char *raw_event = (void *)event;
const char *color = PERF_COLOR_BLUE;
@@ -58,9 +58,9 @@ void trace_event(union perf_event *event)

printf(".");
color_fprintf(stdout, color, "\n. ... raw event: size %d bytes\n",
- event->header.size);
+ size);

- for (i = 0; i < event->header.size; i++) {
+ for (i = 0; i < size; i++) {
if ((i & 15) == 0) {
printf(".");
color_fprintf(stdout, color, " %04x: ", i);
@@ -68,7 +68,7 @@ void trace_event(union perf_event *event)

color_fprintf(stdout, color, " %02x", raw_event[i]);

- if (((i & 15) == 15) || i == event->header.size-1) {
+ if (((i & 15) == 15) || i == size-1) {
color_fprintf(stdout, color, " ");
for (j = 0; j < 15-(i & 15); j++)
color_fprintf(stdout, color, " ");
diff --git a/tools/perf/util/debug.h b/tools/perf/util/debug.h
index efbd98805ad0..9cebb0cdb7bd 100644
--- a/tools/perf/util/debug.h
+++ b/tools/perf/util/debug.h
@@ -12,7 +12,7 @@ extern int verbose;
extern bool quiet, dump_trace;

int dump_printf(const char *fmt, ...) __attribute__((format(printf, 1, 2)));
-void trace_event(union perf_event *event);
+void dump_raw_event(void *event, int size);

int ui__error(const char *format, ...) __attribute__((format(printf, 1, 2)));
int ui__warning(const char *format, ...) __attribute__((format(printf, 1, 2)));
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index a0ce5a46abbe..8e6d05acab32 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -793,7 +793,7 @@ static void dump_event(struct perf_session *session, union perf_event *event,
printf("\n%#" PRIx64 " [%#x]: event: %d\n",
file_offset, event->header.size, event->header.type);

- trace_event(event);
+ dump_raw_event(event, event->header.size);

if (sample)
perf_session__print_tstamp(session, event, sample);
--
1.7.11.7

2013-07-30 09:19:26

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 06/17] perf ftrace: Split "live" sub-command

From: Namhyung Kim <[email protected]>

Separate out the default behavior to "live" subcommand. It's a
preparation to support more subcommands like "record" and "report".

Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-ftrace.c | 133 ++++++++++++++++++++++++++------------------
1 file changed, 80 insertions(+), 53 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 642aa49c66d7..1bb6d1ff0eb1 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -171,19 +171,13 @@ static int reset_tracing_cpu(void)
return 0;
}

-static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)
+static int do_ftrace_live(struct perf_ftrace *ftrace)
{
char *trace_file;
int trace_fd;
char buf[4096];
- struct pollfd pollfd = {
- .events = POLLIN,
- };
-
- if (geteuid() != 0) {
- pr_err("ftrace only works for root!\n");
- return -1;
- }
+ /* sleep 1ms if no data read */
+ struct timespec req = { .tv_nsec = 1000000 };

signal(SIGINT, sig_handler);
signal(SIGUSR1, sig_handler);
@@ -196,12 +190,6 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)
if (write_tracing_file("trace", "0") < 0)
goto out;

- if (argc && perf_evlist__prepare_workload(ftrace->evlist,
- &ftrace->target,
- argv, false, true) < 0) {
- goto out;
- }
-
if (set_tracing_pid(ftrace) < 0) {
pr_err("failed to set ftrace pid\n");
goto out_reset;
@@ -233,7 +221,6 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)
}

fcntl(trace_fd, F_SETFL, O_NONBLOCK);
- pollfd.fd = trace_fd;

if (write_tracing_file("tracing_on", "1") < 0) {
pr_err("can't enable tracing\n");
@@ -243,16 +230,18 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)
perf_evlist__start_workload(ftrace->evlist);

while (!done) {
- if (poll(&pollfd, 1, -1) < 0)
- break;
+ int n = read(trace_fd, buf, sizeof(buf));

- if (pollfd.revents & POLLIN) {
- int n = read(trace_fd, buf, sizeof(buf));
- if (n < 0)
- break;
- if (fwrite(buf, n, 1, stdout) != 1)
+ if (n < 0) {
+ if (errno == EINTR || errno == EAGAIN)
+ goto sleep;
+ else
break;
- }
+ } else if (n == 0) {
+sleep:
+ clock_nanosleep(CLOCK_MONOTONIC, 0, &req, NULL);
+ } else if (fwrite(buf, n, 1, stdout) != 1)
+ break;
}

write_tracing_file("tracing_on", "0");
@@ -274,61 +263,99 @@ out:
return done ? 0 : -1;
}

-int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
+static int
+__cmd_ftrace_live(struct perf_ftrace *ftrace, int argc, const char **argv)
{
- int ret;
- struct perf_ftrace ftrace = {
- .target = { .uid = UINT_MAX, },
- };
- const char * const ftrace_usage[] = {
- "perf ftrace [<options>] [<command>]",
- "perf ftrace [<options>] -- <command> [<options>]",
+ int ret = -1;
+ const char * const live_usage[] = {
+ "perf ftrace live [<options>] [<command>]",
+ "perf ftrace live [<options>] -- <command> [<options>]",
NULL
};
- const struct option ftrace_options[] = {
- OPT_STRING('t', "tracer", &ftrace.tracer, "tracer",
+ const struct option live_options[] = {
+ OPT_STRING('t', "tracer", &ftrace->tracer, "tracer",
"tracer to use: function_graph or function"),
- OPT_STRING('p', "pid", &ftrace.target.pid, "pid",
+ OPT_STRING('p', "pid", &ftrace->target.pid, "pid",
"trace on existing process id"),
OPT_INCR('v', "verbose", &verbose,
"be more verbose"),
- OPT_BOOLEAN('a', "all-cpus", &ftrace.target.system_wide,
+ OPT_BOOLEAN('a', "all-cpus", &ftrace->target.system_wide,
"system-wide collection from all CPUs"),
- OPT_STRING('C', "cpu", &ftrace.target.cpu_list, "cpu",
+ OPT_STRING('C', "cpu", &ftrace->target.cpu_list, "cpu",
"list of cpus to monitor"),
OPT_END()
};

- argc = parse_options(argc, argv, ftrace_options, ftrace_usage,
- PARSE_OPT_STOP_AT_NON_OPTION);
- if (!argc && perf_target__none(&ftrace.target))
- usage_with_options(ftrace_usage, ftrace_options);
+ argc = parse_options(argc, argv, live_options, live_usage,
+ PARSE_OPT_STOP_AT_NON_OPTION);
+ if (!argc && perf_target__none(&ftrace->target))
+ usage_with_options(live_usage, live_options);

- ret = perf_target__validate(&ftrace.target);
+ ret = perf_target__validate(&ftrace->target);
if (ret) {
char errbuf[512];

- perf_target__strerror(&ftrace.target, ret, errbuf, 512);
+ perf_target__strerror(&ftrace->target, ret, errbuf, 512);
pr_err("%s\n", errbuf);
return -EINVAL;
}

- ftrace.evlist = perf_evlist__new();
- if (ftrace.evlist == NULL)
+ ftrace->evlist = perf_evlist__new();
+ if (ftrace->evlist == NULL)
return -ENOMEM;

- ret = perf_evlist__create_maps(ftrace.evlist, &ftrace.target);
+ ret = perf_evlist__create_maps(ftrace->evlist, &ftrace->target);
if (ret < 0)
- goto out_delete_evlist;
+ goto out;
+
+ if (ftrace->tracer == NULL)
+ ftrace->tracer = DEFAULT_TRACER;
+
+ if (argc && perf_evlist__prepare_workload(ftrace->evlist,
+ &ftrace->target,
+ argv, false, true) < 0)
+ goto out_maps;
+
+ ret = do_ftrace_live(ftrace);
+
+out_maps:
+ perf_evlist__delete_maps(ftrace->evlist);
+out:
+ perf_evlist__delete(ftrace->evlist);
+
+ return ret;
+}
+
+int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
+{
+ int ret;
+ struct perf_ftrace ftrace = {
+ .target = { .uid = UINT_MAX, },
+ };
+ const char * const ftrace_usage[] = {
+ "perf ftrace {live} [<options>] [<command>]",
+ "perf ftrace {live} [<options>] -- <command> [<options>]",
+ NULL
+ };
+ const struct option ftrace_options[] = {
+ OPT_END()
+ };

- if (ftrace.tracer == NULL)
- ftrace.tracer = DEFAULT_TRACER;
+ argc = parse_options(argc, argv, ftrace_options, ftrace_usage,
+ PARSE_OPT_STOP_AT_NON_OPTION);
+ if (!argc)
+ usage_with_options(ftrace_usage, ftrace_options);

- ret = __cmd_ftrace(&ftrace, argc, argv);
+ if (geteuid() != 0) {
+ pr_err("ftrace only works for root!\n");
+ return -1;
+ }

- perf_evlist__delete_maps(ftrace.evlist);
-out_delete_evlist:
- perf_evlist__delete(ftrace.evlist);
+ if (strcmp(argv[0], "live") == 0) {
+ ret = __cmd_ftrace_live(&ftrace, argc, argv);
+ } else {
+ usage_with_options(ftrace_usage, ftrace_options);
+ }

return ret;
}
--
1.7.11.7

2013-07-30 09:19:31

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 11/17] perf ftrace: Use pager for displaying result

From: Namhyung Kim <[email protected]>

It's convenient to use pager when seeing many lines of result.

Note that setup_pager() should be called after perf_evlist__
prepare_workload() since they can interfere each other regarding
shared stdio streams.

Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-ftrace.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index bb0a238b8509..f8e363b3c06d 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -227,6 +227,7 @@ static int do_ftrace_live(struct perf_ftrace *ftrace)
signal(SIGINT, sig_handler);
signal(SIGUSR1, sig_handler);
signal(SIGCHLD, sig_handler);
+ signal(SIGPIPE, sig_handler);

if (setup_tracing_files(ftrace) < 0)
goto out_reset;
@@ -1445,6 +1446,8 @@ __cmd_ftrace_live(struct perf_ftrace *ftrace, int argc, const char **argv)
argv, false, true) < 0)
goto out_maps;

+ setup_pager();
+
ret = do_ftrace_live(ftrace);

out_maps:
@@ -1566,6 +1569,8 @@ __cmd_ftrace_show(struct perf_ftrace *ftrace, int argc, const char **argv)
if (ftrace->dirname == NULL)
ftrace->dirname = DEFAULT_DIRNAME;

+ setup_pager();
+
ret = do_ftrace_show(ftrace);

perf_evlist__delete_maps(ftrace->evlist);
@@ -1626,6 +1631,7 @@ __cmd_ftrace_report(struct perf_ftrace *ftrace, int argc, const char **argv)

perf_hpp__init();

+ setup_pager();
setup_sorting();

symbol_conf.exclude_other = false;
--
1.7.11.7

2013-07-30 09:19:49

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 16/17] perf ftrace: Show leaf-functions as oneliner

From: Namhyung Kim <[email protected]>

Detect leaf functions and print them in a same line.

Note that it only converts leaf-functions that doesn't have any other
records between entry and exit even in other cpus. I left other leaf
functions as is.

Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-ftrace.c | 87 +++++++++++++++++++++++++++++++++++++++------
1 file changed, 76 insertions(+), 11 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 25a07db864b4..8d4ec20e3854 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -874,34 +874,77 @@ function_handler(struct trace_seq *s, struct pevent_record *record,

#define TRACE_GRAPH_INDENT 2

+static struct pevent_record *peek_ordered_record(struct perf_ftrace *ftrace);
+static struct pevent_record *get_ordered_record(struct perf_ftrace *ftrace);
+
+static struct event_format *fgraph_exit_event;
+
static int
fgraph_ent_handler(struct trace_seq *s, struct pevent_record *record,
- struct event_format *event, void *context __maybe_unused)
+ struct event_format *event, void *context)
{
unsigned long long depth;
unsigned long long val;
const char *func;
+ struct perf_ftrace *ftrace = context;
+ struct pevent_record *next;
+ bool is_leaf = false;
+ bool needs_free = false;
+ void *data;
+ int ret = -1;
int i;

+ /*
+ * record->data can be invalidated after calling peek_ordered_record()
+ * because it can unmap the current kbuffer page. Make a copy.
+ */
+ data = malloc(record->size);
+ if (data == NULL)
+ goto nested;
+
+ memcpy(data, record->data, record->size);
+ record->data = data;
+ needs_free = true;
+
+ /* detect leaf function and make it one-liner */
+ next = peek_ordered_record(ftrace);
+ if (next && next->cpu == record->cpu &&
+ pevent_data_type(event->pevent, next) == fgraph_exit_event->id) {
+ is_leaf = true;
+ /* consume record */
+ get_ordered_record(ftrace);
+ free(next);
+ }
+
+nested:
if (pevent_get_field_val(s, event, "depth", record, &depth, 1))
- return trace_seq_putc(s, '!');
+ goto out;

/* Function */
for (i = 0; i < (int)(depth * TRACE_GRAPH_INDENT); i++)
trace_seq_putc(s, ' ');

if (pevent_get_field_val(s, event, "func", record, &val, 1))
- return trace_seq_putc(s, '!');
+ goto out;

func = pevent_find_function(event->pevent, val);

if (func)
- trace_seq_printf(s, "%s() {", func);
+ trace_seq_printf(s, "%s()", func);
else
- trace_seq_printf(s, "%llx() {", val);
+ trace_seq_printf(s, "%llx()", val);

- trace_seq_putc(s, '\n');
- return 0;
+ if (is_leaf)
+ trace_seq_puts(s, ";\n");
+ else
+ trace_seq_puts(s, " {\n");
+
+ ret = 0;
+out:
+ if (needs_free)
+ free(record->data);
+
+ return ret;
}

static int
@@ -1103,7 +1146,8 @@ get_ftrace_event_record(struct perf_ftrace *ftrace,
return fra->record;
}

-static struct pevent_record *get_ordered_record(struct perf_ftrace *ftrace)
+static struct ftrace_report_arg *
+__get_ordered_record(struct perf_ftrace *ftrace)
{
struct ftrace_report_arg *fra = NULL;
struct ftrace_report_arg *tmp;
@@ -1117,9 +1161,26 @@ static struct pevent_record *get_ordered_record(struct perf_ftrace *ftrace)
fra = tmp;
}
}
+ return fra;
+}
+
+static struct pevent_record *peek_ordered_record(struct perf_ftrace *ftrace)
+{
+ struct ftrace_report_arg *fra = __get_ordered_record(ftrace);
+
+ if (fra)
+ return fra->record;
+
+ return NULL;
+}
+
+static struct pevent_record *get_ordered_record(struct perf_ftrace *ftrace)
+{
+ struct ftrace_report_arg *fra = __get_ordered_record(ftrace);

if (fra) {
- record = fra->record;
+ struct pevent_record *record = fra->record;
+
fra->record = NULL;
return record;
}
@@ -1175,10 +1236,10 @@ static int do_ftrace_show(struct perf_ftrace *ftrace)
function_handler, NULL);
pevent_register_event_handler(ftrace->pevent, -1,
"ftrace", "funcgraph_entry",
- fgraph_ent_handler, NULL);
+ fgraph_ent_handler, ftrace);
pevent_register_event_handler(ftrace->pevent, -1,
"ftrace", "funcgraph_exit",
- fgraph_ret_handler, NULL);
+ fgraph_ret_handler, ftrace);

if (perf_session__process_events(session, &show.tool) < 0) {
pr_err("failed to process events\n");
@@ -1186,6 +1247,10 @@ static int do_ftrace_show(struct perf_ftrace *ftrace)
goto out;
}

+ fgraph_exit_event = pevent_find_event_by_name(ftrace->pevent, "ftrace",
+ "funcgraph_exit");
+ BUG_ON(fgraph_exit_event == NULL);
+
trace_seq_init(&seq);

record = get_ordered_record(ftrace);
--
1.7.11.7

2013-07-30 09:19:47

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 17/17] perf ftrace: Tidy up the function graph output of 'show' subcommand

From: Namhyung Kim <[email protected]>

Now it doesn't call pevent_print_event() but prints context info in
itself using print_graph_duration(). Make it compact by only print
cpu number and duration:

# perf ftrace show
...
10) 0.065 us | __fsnotify_parent();
10) | fsnotify() {
10) 0.060 us | __srcu_read_lock();
10) 0.040 us | __srcu_read_unlock();
10) 0.652 us | }
10) 0.040 us | fput();
10) | __audit_syscall_exit() {
10) | path_put() {
10) 0.037 us | dput();
10) 0.032 us | mntput();
10) 0.563 us | }
10) 0.035 us | unroll_tree_refs();
10) 0.035 us | kfree();
10) 1.284 us | }
10) | __audit_syscall_entry() {
10) 0.029 us | current_kernel_time();
10) 0.239 us | }

Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-ftrace.c | 35 +++++++++++++++++++++++++++++++++--
1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 8d4ec20e3854..9e78ec19caeb 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -879,6 +879,27 @@ static struct pevent_record *get_ordered_record(struct perf_ftrace *ftrace);

static struct event_format *fgraph_exit_event;

+static void
+print_graph_duration(struct trace_seq *s, struct event_format *event,
+ struct pevent_record *record)
+{
+ unsigned long long duration;
+ unsigned long long rettime, calltime;
+ unsigned long usec, nsec;
+
+ if (pevent_get_field_val(s, event, "rettime", record, &rettime, 1))
+ return;
+
+ if (pevent_get_field_val(s, event, "calltime", record, &calltime, 1))
+ return;
+
+ duration = rettime - calltime;
+ usec = duration / 1000;
+ nsec = duration % 1000;
+
+ trace_seq_printf(s, "%3d) %3lu.%03lu us | ", record->cpu, usec, nsec);
+}
+
static int
fgraph_ent_handler(struct trace_seq *s, struct pevent_record *record,
struct event_format *event, void *context)
@@ -911,9 +932,14 @@ fgraph_ent_handler(struct trace_seq *s, struct pevent_record *record,
if (next && next->cpu == record->cpu &&
pevent_data_type(event->pevent, next) == fgraph_exit_event->id) {
is_leaf = true;
+
+ print_graph_duration(s, fgraph_exit_event, next);
+
/* consume record */
get_ordered_record(ftrace);
free(next);
+ } else {
+ trace_seq_printf(s, "%3d) %*s | ", record->cpu, 10, "");
}

nested:
@@ -954,6 +980,8 @@ fgraph_ret_handler(struct trace_seq *s, struct pevent_record *record,
unsigned long long depth;
int i;

+ print_graph_duration(s, event, record);
+
if (pevent_get_field_val(s, event, "depth", record, &depth, 1))
return trace_seq_putc(s, '!');

@@ -1265,9 +1293,12 @@ static int do_ftrace_show(struct perf_ftrace *ftrace)
continue;
}

- pevent_print_event(ftrace->pevent, &seq, record);
- trace_seq_do_printf(&seq);
+ if (!strcmp(ftrace->tracer, "function_graph"))
+ pevent_event_info(&seq, event, record);
+ else
+ pevent_print_event(ftrace->pevent, &seq, record);

+ trace_seq_do_printf(&seq);
trace_seq_reset(&seq);

free(record);
--
1.7.11.7

2013-07-30 09:20:31

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 14/17] perf ftrace: Add a signal handler for SIGSEGV

From: Namhyung Kim <[email protected]>

It's for debugging purpose.

Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-ftrace.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index ad71d0412a7e..ed889d4dd0ba 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -41,6 +41,13 @@ struct perf_ftrace {

static bool done;

+static void sig_exit(int sig)
+{
+ psignal(sig, "perf");
+ signal(sig, SIG_DFL);
+ raise(sig);
+}
+
static void sig_handler(int sig __maybe_unused)
{
done = true;
@@ -228,6 +235,7 @@ static int do_ftrace_live(struct perf_ftrace *ftrace)
signal(SIGUSR1, sig_handler);
signal(SIGCHLD, sig_handler);
signal(SIGPIPE, sig_handler);
+ signal(SIGSEGV, sig_exit);

if (setup_tracing_files(ftrace) < 0)
goto out_reset;
@@ -613,6 +621,7 @@ static int do_ftrace_record(struct perf_ftrace *ftrace)
signal(SIGINT, sig_handler);
signal(SIGUSR1, sig_handler);
signal(SIGCHLD, sig_handler);
+ signal(SIGSEGV, sig_exit);

if (setup_tracing_files(ftrace) < 0)
goto out_reset;
@@ -1120,6 +1129,8 @@ static int do_ftrace_show(struct perf_ftrace *ftrace)
},
};

+ signal(SIGSEGV, sig_exit);
+
canonicalize_directory_name(ftrace->dirname);

scnprintf(buf, sizeof(buf), "%s.dir/perf.header", ftrace->dirname);
@@ -1211,6 +1222,8 @@ static int do_ftrace_report(struct perf_ftrace *ftrace)
struct machine *machine;
struct dso *dso;

+ signal(SIGSEGV, sig_exit);
+
canonicalize_directory_name(ftrace->dirname);

scnprintf(buf, sizeof(buf), "%s.dir/perf.header", ftrace->dirname);
--
1.7.11.7

2013-07-30 09:20:29

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 15/17] perf ftrace: Add --clock option

From: Namhyung Kim <[email protected]>

The --clock (-c) option is for controlling trace_clock. Default to
'perf' if exists, or 'local'.

Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-ftrace.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index ed889d4dd0ba..25a07db864b4 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -35,6 +35,7 @@ struct perf_ftrace {
struct perf_target target;
const char *tracer;
const char *dirname;
+ const char *clock;
struct pevent *pevent;
bool show_full_info;
};
@@ -114,6 +115,9 @@ static int reset_tracing_files(struct perf_ftrace *ftrace __maybe_unused)
if (reset_tracing_cpu() < 0)
return -1;

+ if (write_tracing_file("trace_clock", "local") < 0)
+ return -1;
+
return 0;
}

@@ -188,6 +192,24 @@ static int reset_tracing_cpu(void)
return 0;
}

+static int set_tracing_clock(struct perf_ftrace *ftrace)
+{
+ const char *clock = ftrace->clock;
+
+ if (clock == NULL)
+ clock = "perf";
+
+ if (!write_tracing_file("trace_clock", clock))
+ return 0;
+
+ /* exit if user specified an invalid clock */
+ if (ftrace->clock)
+ return -1;
+
+ pr_debug("'perf' clock is not supported.. falling back to 'local' clock\n");
+ return write_tracing_file("trace_clock", "local");
+}
+
static int setup_tracing_files(struct perf_ftrace *ftrace)
{
int ret = -1;
@@ -213,6 +235,11 @@ static int setup_tracing_files(struct perf_ftrace *ftrace)
goto out;
}

+ if (set_tracing_clock(ftrace) < 0) {
+ pr_err("failed to set trace clock\n");
+ goto out;
+ }
+
if (write_tracing_file("current_tracer", ftrace->tracer) < 0) {
pr_err("failed to set current_tracer to %s\n", ftrace->tracer);
goto out;
@@ -1475,6 +1502,8 @@ __cmd_ftrace_live(struct perf_ftrace *ftrace, int argc, const char **argv)
"system-wide collection from all CPUs"),
OPT_STRING('C', "cpu", &ftrace->target.cpu_list, "cpu",
"list of cpus to monitor"),
+ OPT_STRING('c', "clock", &ftrace->clock, "clock",
+ "clock to be used for tracer"),
OPT_END()
};

@@ -1517,6 +1546,8 @@ __cmd_ftrace_record(struct perf_ftrace *ftrace, int argc, const char **argv)
"list of cpus to monitor"),
OPT_STRING('o', "output", &ftrace->dirname, "dirname",
"input directory name to use (default: perf.data)"),
+ OPT_STRING('c', "clock", &ftrace->clock, "clock",
+ "clock to be used for tracer"),
OPT_END()
};

--
1.7.11.7

2013-07-30 09:21:29

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 09/17] perf ftrace: Add 'report' sub-command

From: Namhyung Kim <[email protected]>

The ftrace report command is for analyzing ftrace result as usual perf
report style. Internal processing of the ftrace buffer is similar to
the 'show' sub-command but it synthesizes necessary information like
thread, dso, map and symbol from saved trace info.

It currently count number of samples as a period and it can be
extended to use calltime of funcgraph_exit in the future.

Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-ftrace.c | 282 +++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 280 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index c54ab6e90d42..0792bfdf168c 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -21,6 +21,7 @@
#include "util/target.h"
#include "util/thread_map.h"
#include "util/cpumap.h"
+#include "util/sort.h"
#include "util/trace-event.h"
#include "../lib/traceevent/kbuffer.h"
#include "../lib/traceevent/event-parse.h"
@@ -35,6 +36,7 @@ struct perf_ftrace {
const char *tracer;
const char *dirname;
struct pevent *pevent;
+ bool show_full_info;
};

static bool done;
@@ -1176,6 +1178,214 @@ out:
return ret;
}

+struct cmdline_list {
+ struct cmdline_list *next;
+ char *comm;
+ int pid;
+};
+
+struct func_list {
+ struct func_list *next;
+ unsigned long long addr;
+ char *func;
+ char *mod;
+};
+
+static int do_ftrace_report(struct perf_ftrace *ftrace)
+{
+ int ret = -1;
+ char buf[PATH_MAX];
+ unsigned long nr_samples;
+ struct perf_session *session;
+ struct perf_evsel *evsel;
+ struct pevent_record *record;
+ struct perf_ftrace_report report = {
+ .ftrace = ftrace,
+ .tool = {
+ .sample = process_sample_event,
+ },
+ };
+ struct cmdline_list *cmdline;
+ struct func_list *func;
+ struct machine *machine;
+ struct dso *dso;
+
+ canonicalize_directory_name(ftrace->dirname);
+
+ scnprintf(buf, sizeof(buf), "%s.dir/perf.header", ftrace->dirname);
+
+ session = perf_session__new(buf, O_RDONLY, false, false, &report.tool);
+ if (session == NULL) {
+ pr_err("failed to create a session\n");
+ return -1;
+ }
+
+ ftrace->pevent = session->pevent;
+
+ if (perf_session__process_events(session, &report.tool) < 0) {
+ pr_err("failed to process events\n");
+ goto out;
+ }
+
+ machine = machines__findnew(&session->machines, HOST_KERNEL_ID);
+
+ /* Synthesize thread info from saved cmdlines */
+ cmdline = ftrace->pevent->cmdlist;
+ while (cmdline) {
+ struct thread *thread;
+
+ thread = machine__findnew_thread(machine, cmdline->pid);
+ if (thread && !thread->comm_set)
+ thread__set_comm(thread, cmdline->comm);
+
+ cmdline = cmdline->next;
+ }
+
+ /* Synthesize kernel dso and symbol info from saved kallsyms */
+ func = ftrace->pevent->funclist;
+ while (func) {
+ struct symbol *sym;
+
+ scnprintf(buf, sizeof(buf), "[%s]",
+ func->mod ? func->mod : "kernel.kallsyms");
+
+ dso = dso__kernel_findnew(machine, buf, NULL, DSO_TYPE_KERNEL);
+ if (dso == NULL) {
+ pr_debug("can't find or allocate dso %s\n", buf);
+ continue;
+ }
+
+ sym = symbol__new(func->addr, 0, STB_GLOBAL, func->func);
+ if (sym == NULL) {
+ pr_debug("failed to allocate new symbol\n");
+ continue;
+ }
+ symbols__insert(&dso->symbols[MAP__FUNCTION], sym);
+
+ func = func->next;
+ }
+
+ /* Generate kernel maps */
+ list_for_each_entry(dso, &machine->kernel_dsos, node) {
+ struct map *map = map__new2(0, dso, MAP__FUNCTION);
+ if (map == NULL) {
+ pr_debug("failed to allocate new map\n");
+ goto out;
+ }
+
+ symbols__fixup_end(&dso->symbols[MAP__FUNCTION]);
+ map__fixup_start(map);
+ map__fixup_end(map);
+
+ dso__set_loaded(dso, MAP__FUNCTION);
+
+ map_groups__insert(&machine->kmaps, map);
+ if (strcmp(dso->name, "[kernel.kallsyms]") == 0)
+ machine->vmlinux_maps[MAP__FUNCTION] = map;
+ }
+
+ /* FIXME: no need to get ordered */
+ record = get_ordered_record(ftrace);
+ while (record) {
+ int type;
+ struct addr_location al;
+ union perf_event event = {
+ .header = {
+ .misc = PERF_RECORD_MISC_KERNEL,
+ },
+ };
+ struct perf_sample sample = {
+ .cpu = record->cpu,
+ .raw_data = record->data,
+ .period = 1,
+ };
+ struct format_field *field;
+ unsigned long long val;
+
+ type = pevent_data_type(ftrace->pevent, record);
+ evsel = perf_evlist__find_tracepoint_by_id(session->evlist,
+ type);
+ if (evsel == NULL) {
+ pr_warning("no event found for type %d\n", type);
+ continue;
+ }
+
+ event.ip.pid = pevent_data_pid(ftrace->pevent, record);
+
+ if (!strcmp(perf_evsel__name(evsel), "ftrace:function"))
+ field = pevent_find_field(evsel->tp_format, "ip");
+ else
+ field = pevent_find_field(evsel->tp_format, "func");
+
+ if (pevent_read_number_field(field, record->data, &val) < 0) {
+ pr_err("failed to parse function address\n");
+ goto out;
+ }
+ event.ip.ip = val;
+
+ if (perf_event__preprocess_sample(&event, machine, &al,
+ &sample, NULL) < 0) {
+ pr_err("problem processing %d event, skipping it.\n",
+ event.header.type);
+ goto out;
+ }
+
+ /* TODO: update sample.period using calltime */
+ if (!__hists__add_entry(&evsel->hists, &al, NULL,
+ sample.period, 0)) {
+ pr_err("failed to add a hist entry\n");
+ goto out;
+ }
+
+ evsel->hists.stats.total_period += sample.period;
+ hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+
+ free(record);
+ record = get_ordered_record(ftrace);
+ }
+ ret = 0;
+
+ perf_session__fprintf_info(session, stdout, ftrace->show_full_info);
+
+ nr_samples = 0;
+ list_for_each_entry(evsel, &session->evlist->entries, node) {
+ struct hists *hists = &evsel->hists;
+
+ hists__collapse_resort(hists);
+ hists__output_resort(&evsel->hists);
+ nr_samples += hists->stats.nr_events[PERF_RECORD_SAMPLE];
+ }
+
+ if (nr_samples == 0) {
+ pr_warning("The %s file has no samples!\n", session->filename);
+ goto out;
+ }
+
+ list_for_each_entry(evsel, &session->evlist->entries, node) {
+ struct hists *hists = &evsel->hists;
+ const char *evname = perf_evsel__name(evsel);
+ u64 nr_events = hists->stats.total_period;
+ char unit;
+
+ nr_samples = hists->stats.nr_events[PERF_RECORD_SAMPLE];
+ nr_samples = convert_unit(nr_samples, &unit);
+ fprintf(stdout, "# Samples: %lu%c", nr_samples, unit);
+ if (evname != NULL)
+ fprintf(stdout, " of event '%s'", evname);
+
+ fprintf(stdout, "\n# Event count (approx.): %" PRIu64, nr_events);
+ fprintf(stdout, "\n#\n");
+
+ hists__fprintf(hists, true, 0, 0, 0.0, stdout);
+ fprintf(stdout, "\n\n");
+ }
+
+out:
+ free_ftrace_report_args(ftrace);
+ perf_session__delete(session);
+ return ret;
+}
+
static int
__cmd_ftrace_live(struct perf_ftrace *ftrace, int argc, const char **argv)
{
@@ -1359,6 +1569,72 @@ out:
return ret;
}

+static int
+__cmd_ftrace_report(struct perf_ftrace *ftrace, int argc, const char **argv)
+{
+ int ret = -1;
+ const char * const report_usage[] = {
+ "perf ftrace report [<options>]",
+ NULL
+ };
+ const struct option report_options[] = {
+ OPT_STRING('i', "input", &ftrace->dirname, "dirname",
+ "input directory name to use (default: perf.data)"),
+ OPT_INCR('v', "verbose", &verbose,
+ "be more verbose"),
+ OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
+ "dump raw trace in ASCII"),
+ OPT_STRING('C', "cpu", &ftrace->target.cpu_list, "cpu",
+ "list of cpus to monitor"),
+ OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
+ "sort by key(s): pid, comm, dso, symbol, cpu"),
+ OPT_BOOLEAN('I', "show-info", &ftrace->show_full_info,
+ "Display extended information like cpu/numa topology"),
+ OPT_END()
+ };
+
+ argc = parse_options(argc, argv, report_options, report_usage,
+ PARSE_OPT_STOP_AT_NON_OPTION);
+ if (argc)
+ usage_with_options(report_usage, report_options);
+
+ ret = perf_target__validate(&ftrace->target);
+ if (ret) {
+ char errbuf[512];
+
+ perf_target__strerror(&ftrace->target, ret, errbuf, 512);
+ pr_err("%s\n", errbuf);
+ return -EINVAL;
+ }
+
+ ftrace->evlist = perf_evlist__new();
+ if (ftrace->evlist == NULL)
+ return -ENOMEM;
+
+ ret = perf_evlist__create_maps(ftrace->evlist, &ftrace->target);
+ if (ret < 0)
+ goto out;
+
+ if (ftrace->dirname == NULL)
+ ftrace->dirname = DEFAULT_DIRNAME;
+
+ perf_hpp__init();
+
+ setup_sorting();
+
+ symbol_conf.exclude_other = false;
+ symbol_conf.try_vmlinux_path = false;
+ symbol__init();
+
+ ret = do_ftrace_report(ftrace);
+
+ perf_evlist__delete_maps(ftrace->evlist);
+out:
+ perf_evlist__delete(ftrace->evlist);
+
+ return ret;
+}
+
int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
{
int ret;
@@ -1366,8 +1642,8 @@ int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
.target = { .uid = UINT_MAX, },
};
const char * const ftrace_usage[] = {
- "perf ftrace {live|record|show} [<options>] [<command>]",
- "perf ftrace {live|record|show} [<options>] -- <command> [<options>]",
+ "perf ftrace {live|record|show|report} [<options>] [<command>]",
+ "perf ftrace {live|record|show|report} [<options>] -- <command> [<options>]",
NULL
};
const struct option ftrace_options[] = {
@@ -1390,6 +1666,8 @@ int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
ret = __cmd_ftrace_record(&ftrace, argc, argv);
} else if (strcmp(argv[0], "show") == 0) {
ret = __cmd_ftrace_show(&ftrace, argc, argv);
+ } else if (strncmp(argv[0], "rep", 3) == 0) {
+ ret = __cmd_ftrace_report(&ftrace, argc, argv);
} else {
usage_with_options(ftrace_usage, ftrace_options);
}
--
1.7.11.7

2013-07-30 09:21:27

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 12/17] perf ftrace: Cleanup using ftrace_setup/teardown()

From: Namhyung Kim <[email protected]>

The ftrace sub-commands share some common code so that factor it out
to ftrace_setup() and ftrace_teardown() helpers.

Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-ftrace.c | 168 +++++++++++++++++---------------------------
1 file changed, 64 insertions(+), 104 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index f8e363b3c06d..ad71d0412a7e 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -1393,10 +1393,59 @@ out:
return ret;
}

+static int ftrace_setup(struct perf_ftrace *ftrace, int argc, const char **argv)
+{
+ int ret;
+ char errbuf[512];
+
+ ret = perf_target__validate(&ftrace->target);
+ if (ret) {
+ perf_target__strerror(&ftrace->target, ret, errbuf, 512);
+ pr_err("%s\n", errbuf);
+ return -EINVAL;
+ }
+
+ ftrace->evlist = perf_evlist__new();
+ if (ftrace->evlist == NULL)
+ return -ENOMEM;
+
+ ret = perf_evlist__create_maps(ftrace->evlist, &ftrace->target);
+ if (ret < 0)
+ goto out;
+
+ if (ftrace->tracer == NULL)
+ ftrace->tracer = DEFAULT_TRACER;
+
+ if (ftrace->dirname == NULL)
+ ftrace->dirname = DEFAULT_DIRNAME;
+
+ if (argc) {
+ ret = perf_evlist__prepare_workload(ftrace->evlist,
+ &ftrace->target,
+ argv, false, true);
+ if (ret < 0)
+ goto out_maps;
+ }
+ return 0;
+
+out_maps:
+ perf_evlist__delete_maps(ftrace->evlist);
+out:
+ perf_evlist__delete(ftrace->evlist);
+
+ return ret;
+}
+
+static void ftrace_teardown(struct perf_ftrace *ftrace)
+{
+ perf_evlist__delete_maps(ftrace->evlist);
+ perf_evlist__delete(ftrace->evlist);
+}
+
static int
__cmd_ftrace_live(struct perf_ftrace *ftrace, int argc, const char **argv)
{
- int ret = -1;
+ int ret;
const char * const live_usage[] = {
"perf ftrace live [<options>] [<command>]",
"perf ftrace live [<options>] -- <command> [<options>]",
@@ -1421,47 +1470,22 @@ __cmd_ftrace_live(struct perf_ftrace *ftrace, int argc, const char **argv)
if (!argc && perf_target__none(&ftrace->target))
usage_with_options(live_usage, live_options);

- ret = perf_target__validate(&ftrace->target);
- if (ret) {
- char errbuf[512];
-
- perf_target__strerror(&ftrace->target, ret, errbuf, 512);
- pr_err("%s\n", errbuf);
- return -EINVAL;
- }
-
- ftrace->evlist = perf_evlist__new();
- if (ftrace->evlist == NULL)
- return -ENOMEM;
-
- ret = perf_evlist__create_maps(ftrace->evlist, &ftrace->target);
+ ret = ftrace_setup(ftrace, argc, argv);
if (ret < 0)
- goto out;
-
- if (ftrace->tracer == NULL)
- ftrace->tracer = DEFAULT_TRACER;
-
- if (argc && perf_evlist__prepare_workload(ftrace->evlist,
- &ftrace->target,
- argv, false, true) < 0)
- goto out_maps;
+ return ret;

setup_pager();

ret = do_ftrace_live(ftrace);

-out_maps:
- perf_evlist__delete_maps(ftrace->evlist);
-out:
- perf_evlist__delete(ftrace->evlist);
-
+ ftrace_teardown(ftrace);
return ret;
}

static int
__cmd_ftrace_record(struct perf_ftrace *ftrace, int argc, const char **argv)
{
- int ret = -1;
+ int ret;
const char * const record_usage[] = {
"perf ftrace record [<options>] [<command>]",
"perf ftrace record [<options>] -- <command> [<options>]",
@@ -1488,39 +1512,13 @@ __cmd_ftrace_record(struct perf_ftrace *ftrace, int argc, const char **argv)
if (!argc && perf_target__none(&ftrace->target))
usage_with_options(record_usage, record_options);

- ret = perf_target__validate(&ftrace->target);
- if (ret) {
- char errbuf[512];
-
- perf_target__strerror(&ftrace->target, ret, errbuf, 512);
- pr_err("%s\n", errbuf);
- return -EINVAL;
- }
-
- ftrace->evlist = perf_evlist__new();
- if (ftrace->evlist == NULL)
- return -ENOMEM;
-
- ret = perf_evlist__create_maps(ftrace->evlist, &ftrace->target);
+ ret = ftrace_setup(ftrace, argc, argv);
if (ret < 0)
- goto out;
-
- if (ftrace->tracer == NULL)
- ftrace->tracer = DEFAULT_TRACER;
-
- if (ftrace->dirname == NULL)
- ftrace->dirname = DEFAULT_DIRNAME;
-
- if (argc && perf_evlist__prepare_workload(ftrace->evlist,
- &ftrace->target,
- argv, false, true) < 0)
- goto out_maps;
+ return ret;

ret = do_ftrace_record(ftrace);

-out_maps:
perf_evlist__delete_maps(ftrace->evlist);
-out:
perf_evlist__delete(ftrace->evlist);

return ret;
@@ -1529,7 +1527,7 @@ out:
static int
__cmd_ftrace_show(struct perf_ftrace *ftrace, int argc, const char **argv)
{
- int ret = -1;
+ int ret;
const char * const show_usage[] = {
"perf ftrace show [<options>]",
NULL
@@ -1549,41 +1547,22 @@ __cmd_ftrace_show(struct perf_ftrace *ftrace, int argc, const char **argv)
if (argc)
usage_with_options(show_usage, show_options);

- ret = perf_target__validate(&ftrace->target);
- if (ret) {
- char errbuf[512];
-
- perf_target__strerror(&ftrace->target, ret, errbuf, 512);
- pr_err("%s\n", errbuf);
- return -EINVAL;
- }
-
- ftrace->evlist = perf_evlist__new();
- if (ftrace->evlist == NULL)
- return -ENOMEM;
-
- ret = perf_evlist__create_maps(ftrace->evlist, &ftrace->target);
+ ret = ftrace_setup(ftrace, argc, argv);
if (ret < 0)
- goto out;
-
- if (ftrace->dirname == NULL)
- ftrace->dirname = DEFAULT_DIRNAME;
+ return ret;

setup_pager();

ret = do_ftrace_show(ftrace);

- perf_evlist__delete_maps(ftrace->evlist);
-out:
- perf_evlist__delete(ftrace->evlist);
-
+ ftrace_teardown(ftrace);
return ret;
}

static int
__cmd_ftrace_report(struct perf_ftrace *ftrace, int argc, const char **argv)
{
- int ret = -1;
+ int ret;
const char * const report_usage[] = {
"perf ftrace report [<options>]",
NULL
@@ -1609,25 +1588,9 @@ __cmd_ftrace_report(struct perf_ftrace *ftrace, int argc, const char **argv)
if (argc)
usage_with_options(report_usage, report_options);

- ret = perf_target__validate(&ftrace->target);
- if (ret) {
- char errbuf[512];
-
- perf_target__strerror(&ftrace->target, ret, errbuf, 512);
- pr_err("%s\n", errbuf);
- return -EINVAL;
- }
-
- ftrace->evlist = perf_evlist__new();
- if (ftrace->evlist == NULL)
- return -ENOMEM;
-
- ret = perf_evlist__create_maps(ftrace->evlist, &ftrace->target);
+ ret = ftrace_setup(ftrace, argc, argv);
if (ret < 0)
- goto out;
-
- if (ftrace->dirname == NULL)
- ftrace->dirname = DEFAULT_DIRNAME;
+ return ret;

perf_hpp__init();

@@ -1640,10 +1603,7 @@ __cmd_ftrace_report(struct perf_ftrace *ftrace, int argc, const char **argv)

ret = do_ftrace_report(ftrace);

- perf_evlist__delete_maps(ftrace->evlist);
-out:
- perf_evlist__delete(ftrace->evlist);
-
+ ftrace_teardown(ftrace);
return ret;
}

--
1.7.11.7

2013-07-30 09:21:26

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 13/17] perf tools: Add document for perf-ftrace command

From: Namhyung Kim <[email protected]>

Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/Documentation/perf-ftrace.txt | 116 +++++++++++++++++++++++++++++++
1 file changed, 116 insertions(+)
create mode 100644 tools/perf/Documentation/perf-ftrace.txt

diff --git a/tools/perf/Documentation/perf-ftrace.txt b/tools/perf/Documentation/perf-ftrace.txt
new file mode 100644
index 000000000000..4463a33e367a
--- /dev/null
+++ b/tools/perf/Documentation/perf-ftrace.txt
@@ -0,0 +1,116 @@
+perf-ftrace(1)
+==============
+
+NAME
+----
+perf-ftrace - A front end for kernel's ftrace
+
+SYNOPSIS
+--------
+[verse]
+'perf ftrace' {live|record} [<options>] [<command>]
+'perf ftrace' {show|report} [<options>]
+
+DESCRIPTION
+-----------
+This command reads the ftrace buffer and displays the trace recorded.
+
+There are several variants of perf ftrace:
+
+ 'perf ftrace live <command>' to see a live trace of kernel functions
+ via trace_pipe during executing the <command>. If <command> is not
+ specified, one of the target options (-p, -a or -C) should be given.
+ It just prints out the result to stdout and doesn't save any files.
+
+ 'perf ftrace record <command>' to record trace entries of kernel
+ functions during the execution of the <command>. Like 'perf ftrace
+ live', at least one of <command> and/or target options should be
+ given in order to start recording. The recorded results are saved
+ under a directory ('perf.data.dir' by default) and will be used by
+ other perf-ftrace tools like 'show' and 'report'.
+
+ 'perf ftrace show' to see the trace of recorded functions. It shows
+ functions sorted by time so the end result might be interspersed if
+ there's a concurrent execution.
+
+ 'perf ftrace report' to display the result in an usual perf-report
+ style - entries are sorted by given sort keys and output is resorted
+ by its overhead.
+
+OPTIONS
+-------
+<command>...::
+ Any command you can specify in a shell.
+
+-t::
+--tracer=::
+ The ftrace tracer to be used (default: function_graph).
+ Currently, only 'function' and 'function_graph' are supported
+ by 'record' command. 'live' command accepts any available
+ tracer in system and outputs trace result directly from the
+ ftrace's trace_pipe.
+
+-p::
+--pid=::
+ Record events on existing process ID (comma separated list).
+ Used by 'live' and 'record' subcommands only.
+
+-a::
+ Force system-wide collection. Scripts run without a <command>
+ normally use -a by default, while scripts run with a <command>
+ normally don't - this option allows the latter to be run in
+ system-wide mode. Used by 'live' and 'record' subcommands only.
+
+-C::
+--cpu:: Only process samples for the list of CPUs provided.
+ Multiple CPUs can be provided as a comma-separated list with
+ no space: 0,1. Ranges of CPUs are specified with -: 0-2.
+ Default is to report samples on all online CPUs.
+
+-s::
+--sort=::
+ Sort histogram entries by given key(s) - multiple keys can be
+ specified in CSV format. Following sort keys are available:
+ pid, comm, dso, symbol, cpu.
+
+ Each key has following meaning:
+
+ - comm: command (name) of the task
+ - pid: command and tid of the task
+ - dso: name of library or module executed at the time of sample
+ - symbol: name of function executed at the time of sample
+ - cpu: cpu number the task ran at the time of sample
+
+ By default, comm, dso and symbol keys are used.
+ (i.e. --sort comm,dso,symbol)
+ Used by 'report' subcommands only.
+
+-i::
+--input=::
+ Input directory name excluding '.dir' at the end.
+ (default: perf.data)
+
+-o::
+--output=::
+ Output directory name excluding '.dir' at the end.
+ (default: perf.data)
+
+-v::
+--verbose::
+ Be more verbose (show counter open errors, etc).
+
+-D::
+--dump-raw-trace::
+ Dump raw trace in ASCII. Used by 'report' subcommands only.
+
+-I::
+--show-info::
+ Display extended information about the record. This adds
+ information which may be very large and thus may clutter the
+ display. It currently includes: cpu and numa topology of the
+ host system. It can only be used with the 'report' subcommand.
+
+SEE ALSO
+--------
+linkperf:perf-record[1], linkperf:perf-report[1],
+linkperf:perf-script[1]
--
1.7.11.7

2013-07-30 09:22:20

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 08/17] perf ftrace: Add 'show' sub-command

From: Namhyung Kim <[email protected]>

The ftrace show subcommand is for viewing recorded ftrace files. It
enters perf.data.dir directory and open perf.header file to find out
necessary information. And then read out per-cpu trace records using
kbuffer helper and print them to stdout in time order.

It only shows its basic form so function graph doesn't show duration/
overhead and no leaf entry handling is provided yet. Maybe it can be
handled by a proper plugin in the libtraceevent.

Cc: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-ftrace.c | 435 +++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 433 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 46b5c136e4bb..c54ab6e90d42 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -13,6 +13,7 @@
#include <signal.h>
#include <fcntl.h>
#include <dirent.h>
+#include <sys/mman.h>

#include "util/debug.h"
#include "util/parse-options.h"
@@ -21,6 +22,8 @@
#include "util/thread_map.h"
#include "util/cpumap.h"
#include "util/trace-event.h"
+#include "../lib/traceevent/kbuffer.h"
+#include "../lib/traceevent/event-parse.h"


#define DEFAULT_TRACER "function_graph"
@@ -31,6 +34,7 @@ struct perf_ftrace {
struct perf_target target;
const char *tracer;
const char *dirname;
+ struct pevent *pevent;
};

static bool done;
@@ -800,6 +804,379 @@ out_reset:
}

static int
+function_handler(struct trace_seq *s, struct pevent_record *record,
+ struct event_format *event, void *context __maybe_unused)
+{
+ struct pevent *pevent = event->pevent;
+ unsigned long long function;
+ const char *func;
+
+ if (pevent_get_field_val(s, event, "ip", record, &function, 1))
+ return trace_seq_putc(s, '!');
+
+ func = pevent_find_function(pevent, function);
+ if (func)
+ trace_seq_printf(s, "%s <-- ", func);
+ else
+ trace_seq_printf(s, "0x%llx", function);
+
+ if (pevent_get_field_val(s, event, "parent_ip", record, &function, 1))
+ return trace_seq_putc(s, '!');
+
+ func = pevent_find_function(pevent, function);
+ if (func)
+ trace_seq_printf(s, "%s", func);
+ else
+ trace_seq_printf(s, "0x%llx", function);
+
+ trace_seq_putc(s, '\n');
+ return 0;
+}
+
+#define TRACE_GRAPH_INDENT 2
+
+static int
+fgraph_ent_handler(struct trace_seq *s, struct pevent_record *record,
+ struct event_format *event, void *context __maybe_unused)
+{
+ unsigned long long depth;
+ unsigned long long val;
+ const char *func;
+ int i;
+
+ if (pevent_get_field_val(s, event, "depth", record, &depth, 1))
+ return trace_seq_putc(s, '!');
+
+ /* Function */
+ for (i = 0; i < (int)(depth * TRACE_GRAPH_INDENT); i++)
+ trace_seq_putc(s, ' ');
+
+ if (pevent_get_field_val(s, event, "func", record, &val, 1))
+ return trace_seq_putc(s, '!');
+
+ func = pevent_find_function(event->pevent, val);
+
+ if (func)
+ trace_seq_printf(s, "%s() {", func);
+ else
+ trace_seq_printf(s, "%llx() {", val);
+
+ trace_seq_putc(s, '\n');
+ return 0;
+}
+
+static int
+fgraph_ret_handler(struct trace_seq *s, struct pevent_record *record,
+ struct event_format *event, void *context __maybe_unused)
+{
+ unsigned long long depth;
+ int i;
+
+ if (pevent_get_field_val(s, event, "depth", record, &depth, 1))
+ return trace_seq_putc(s, '!');
+
+ /* Function */
+ for (i = 0; i < (int)(depth * TRACE_GRAPH_INDENT); i++)
+ trace_seq_putc(s, ' ');
+
+ trace_seq_puts(s, "}\n");
+ return 0;
+}
+
+struct perf_ftrace_report {
+ struct perf_ftrace *ftrace;
+ struct perf_tool tool;
+};
+
+struct ftrace_report_arg {
+ struct list_head node;
+ struct pevent_record *record;
+ struct kbuffer *kbuf;
+ void *map;
+ int cpu;
+ int fd;
+ int done;
+ off_t offset;
+ off_t size;
+};
+
+static LIST_HEAD(ftrace_cpu_buffers);
+
+static int process_sample_event(struct perf_tool *tool,
+ union perf_event * event __maybe_unused,
+ struct perf_sample *sample,
+ struct perf_evsel *evsel __maybe_unused,
+ struct machine *machine __maybe_unused)
+{
+ struct perf_ftrace *ftrace;
+ struct perf_ftrace_report *report;
+ struct ftrace_report_arg *fra;
+ struct stat statbuf;
+ enum kbuffer_long_size long_size;
+ enum kbuffer_endian endian;
+ char buf[PATH_MAX];
+
+ report = container_of(tool, struct perf_ftrace_report, tool);
+ ftrace = report->ftrace;
+
+ if (perf_target__has_cpu(&ftrace->target)) {
+ int i;
+ bool found = false;
+
+ for (i = 0; i < cpu_map__nr(ftrace->evlist->cpus); i++) {
+ if ((int)sample->cpu == ftrace->evlist->cpus->map[i]) {
+ found = true;
+ break;
+ }
+ }
+ if (!found)
+ return 0;
+ }
+
+ fra = zalloc(sizeof(*fra));
+ if (fra == NULL)
+ return -1;
+
+ fra->cpu = sample->cpu;
+
+ scnprintf(buf, sizeof(buf), "%s.dir/trace-cpu%d.buf",
+ ftrace->dirname, fra->cpu);
+
+ fra->fd = open(buf, O_RDONLY);
+ if (fra->fd < 0)
+ goto out;
+
+ if (fstat(fra->fd, &statbuf) < 0)
+ goto out_close;
+
+ fra->size = statbuf.st_size;
+ if (fra->size == 0) {
+ /* skip zero-size buffers */
+ close(fra->fd);
+ free(fra);
+ return 0;
+ }
+
+ /*
+ * FIXME: What if pevent->page_size is smaller than current page size?
+ */
+ fra->map = mmap(NULL, pevent_get_page_size(ftrace->pevent),
+ PROT_READ, MAP_PRIVATE, fra->fd, fra->offset);
+ if (fra->map == MAP_FAILED)
+ goto out_close;
+
+ fra->offset = 0;
+
+ if (pevent_is_file_bigendian(ftrace->pevent))
+ endian = KBUFFER_ENDIAN_BIG;
+ else
+ endian = KBUFFER_ENDIAN_LITTLE;
+
+ if (pevent_get_long_size(ftrace->pevent) == 8)
+ long_size = KBUFFER_LSIZE_8;
+ else
+ long_size = KBUFFER_LSIZE_4;
+
+ fra->kbuf = kbuffer_alloc(long_size, endian);
+ if (fra->kbuf == NULL)
+ goto out_unmap;
+
+ if (ftrace->pevent->old_format)
+ kbuffer_set_old_format(fra->kbuf);
+
+ kbuffer_load_subbuffer(fra->kbuf, fra->map);
+
+ pr_debug2("setup kbuffer for cpu%d\n", fra->cpu);
+ list_add_tail(&fra->node, &ftrace_cpu_buffers);
+ return 0;
+
+out_unmap:
+ munmap(fra->map, pevent_get_page_size(ftrace->pevent));
+out_close:
+ close(fra->fd);
+out:
+ free(fra);
+ return -1;
+}
+
+static struct pevent_record *
+get_next_ftrace_event_record(struct perf_ftrace *ftrace,
+ struct ftrace_report_arg *fra)
+{
+ struct pevent_record *record;
+ unsigned long long ts;
+ void *data;
+
+retry:
+ data = kbuffer_read_event(fra->kbuf, &ts);
+ if (data) {
+ record = zalloc(sizeof(*record));
+ if (record == NULL) {
+ pr_err("memory allocation failure\n");
+ return NULL;
+ }
+
+ record->ts = ts;
+ record->cpu = fra->cpu;
+ record->data = data;
+ record->size = kbuffer_event_size(fra->kbuf);
+ record->record_size = kbuffer_curr_size(fra->kbuf);
+ record->offset = kbuffer_curr_offset(fra->kbuf);
+ record->missed_events = kbuffer_missed_events(fra->kbuf);
+ record->ref_count = 1;
+
+ kbuffer_next_event(fra->kbuf, NULL);
+ return record;
+ }
+
+ munmap(fra->map, pevent_get_page_size(ftrace->pevent));
+ fra->map = NULL;
+
+ if (fra->done)
+ return NULL;
+
+ fra->offset += pevent_get_page_size(ftrace->pevent);
+ if (fra->offset >= fra->size) {
+ /* EOF */
+ fra->done = 1;
+ return NULL;
+ }
+
+ fra->map = mmap(NULL, pevent_get_page_size(ftrace->pevent),
+ PROT_READ, MAP_PRIVATE, fra->fd, fra->offset);
+ if (fra->map == MAP_FAILED) {
+ pr_err("memory mapping failed\n");
+ return NULL;
+ }
+
+ kbuffer_load_subbuffer(fra->kbuf, fra->map);
+
+ goto retry;
+}
+
+static struct pevent_record *
+get_ftrace_event_record(struct perf_ftrace *ftrace,
+ struct ftrace_report_arg *fra)
+{
+ if (fra->record == NULL)
+ fra->record = get_next_ftrace_event_record(ftrace, fra);
+
+ return fra->record;
+}
+
+static struct pevent_record *get_ordered_record(struct perf_ftrace *ftrace)
+{
+ struct ftrace_report_arg *fra = NULL;
+ struct ftrace_report_arg *tmp;
+ struct pevent_record *record;
+ unsigned long long min_ts = LLONG_MAX;
+
+ list_for_each_entry(tmp, &ftrace_cpu_buffers, node) {
+ record = get_ftrace_event_record(ftrace, tmp);
+ if (record && record->ts < min_ts) {
+ min_ts = record->ts;
+ fra = tmp;
+ }
+ }
+
+ if (fra) {
+ record = fra->record;
+ fra->record = NULL;
+ return record;
+ }
+ return NULL;
+}
+
+static void free_ftrace_report_args(struct perf_ftrace *ftrace)
+{
+ struct ftrace_report_arg *fra, *tmp;
+
+ list_for_each_entry_safe(fra, tmp, &ftrace_cpu_buffers, node) {
+ list_del(&fra->node);
+
+ /* don't care about the errors */
+ munmap(fra->map, pevent_get_page_size(ftrace->pevent));
+ kbuffer_free(fra->kbuf);
+ free(fra->record);
+ close(fra->fd);
+ free(fra);
+ }
+}
+
+static int do_ftrace_show(struct perf_ftrace *ftrace)
+{
+ int ret = 0;
+ char buf[PATH_MAX];
+ struct perf_session *session;
+ struct pevent_record *record;
+ struct trace_seq seq;
+ struct perf_ftrace_report show = {
+ .ftrace = ftrace,
+ .tool = {
+ .sample = process_sample_event,
+ },
+ };
+
+ canonicalize_directory_name(ftrace->dirname);
+
+ scnprintf(buf, sizeof(buf), "%s.dir/perf.header", ftrace->dirname);
+
+ session = perf_session__new(buf, O_RDONLY, false, false, &show.tool);
+ if (session == NULL) {
+ pr_err("failed to create a session\n");
+ return -1;
+ }
+
+ ftrace->pevent = session->pevent;
+
+ pevent_register_event_handler(ftrace->pevent, -1,
+ "ftrace", "function",
+ function_handler, NULL);
+ pevent_register_event_handler(ftrace->pevent, -1,
+ "ftrace", "funcgraph_entry",
+ fgraph_ent_handler, NULL);
+ pevent_register_event_handler(ftrace->pevent, -1,
+ "ftrace", "funcgraph_exit",
+ fgraph_ret_handler, NULL);
+
+ if (perf_session__process_events(session, &show.tool) < 0) {
+ pr_err("failed to process events\n");
+ ret = -1;
+ goto out;
+ }
+
+ trace_seq_init(&seq);
+
+ record = get_ordered_record(ftrace);
+ while (record) {
+ int type;
+ struct event_format *event;
+
+ type = pevent_data_type(ftrace->pevent, record);
+ event = pevent_find_event(ftrace->pevent, type);
+ if (!event) {
+ pr_warning("no event found for type %d", type);
+ continue;
+ }
+
+ pevent_print_event(ftrace->pevent, &seq, record);
+ trace_seq_do_printf(&seq);
+
+ trace_seq_reset(&seq);
+
+ free(record);
+ record = get_ordered_record(ftrace);
+ }
+
+ trace_seq_destroy(&seq);
+
+out:
+ free_ftrace_report_args(ftrace);
+ perf_session__delete(session);
+ return ret;
+}
+
+static int
__cmd_ftrace_live(struct perf_ftrace *ftrace, int argc, const char **argv)
{
int ret = -1;
@@ -930,6 +1307,58 @@ out:
return ret;
}

+static int
+__cmd_ftrace_show(struct perf_ftrace *ftrace, int argc, const char **argv)
+{
+ int ret = -1;
+ const char * const show_usage[] = {
+ "perf ftrace show [<options>]",
+ NULL
+ };
+ const struct option show_options[] = {
+ OPT_STRING('i', "input", &ftrace->dirname, "dirname",
+ "input directory name to use (default: perf.data)"),
+ OPT_INCR('v', "verbose", &verbose,
+ "be more verbose"),
+ OPT_STRING('C', "cpu", &ftrace->target.cpu_list, "cpu",
+ "list of cpus to monitor"),
+ OPT_END()
+ };
+
+ argc = parse_options(argc, argv, show_options, show_usage,
+ PARSE_OPT_STOP_AT_NON_OPTION);
+ if (argc)
+ usage_with_options(show_usage, show_options);
+
+ ret = perf_target__validate(&ftrace->target);
+ if (ret) {
+ char errbuf[512];
+
+ perf_target__strerror(&ftrace->target, ret, errbuf, 512);
+ pr_err("%s\n", errbuf);
+ return -EINVAL;
+ }
+
+ ftrace->evlist = perf_evlist__new();
+ if (ftrace->evlist == NULL)
+ return -ENOMEM;
+
+ ret = perf_evlist__create_maps(ftrace->evlist, &ftrace->target);
+ if (ret < 0)
+ goto out;
+
+ if (ftrace->dirname == NULL)
+ ftrace->dirname = DEFAULT_DIRNAME;
+
+ ret = do_ftrace_show(ftrace);
+
+ perf_evlist__delete_maps(ftrace->evlist);
+out:
+ perf_evlist__delete(ftrace->evlist);
+
+ return ret;
+}
+
int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
{
int ret;
@@ -937,8 +1366,8 @@ int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
.target = { .uid = UINT_MAX, },
};
const char * const ftrace_usage[] = {
- "perf ftrace {live|record} [<options>] [<command>]",
- "perf ftrace {live|record} [<options>] -- <command> [<options>]",
+ "perf ftrace {live|record|show} [<options>] [<command>]",
+ "perf ftrace {live|record|show} [<options>] -- <command> [<options>]",
NULL
};
const struct option ftrace_options[] = {
@@ -959,6 +1388,8 @@ int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
ret = __cmd_ftrace_live(&ftrace, argc, argv);
} else if (strncmp(argv[0], "rec", 3) == 0) {
ret = __cmd_ftrace_record(&ftrace, argc, argv);
+ } else if (strcmp(argv[0], "show") == 0) {
+ ret = __cmd_ftrace_show(&ftrace, argc, argv);
} else {
usage_with_options(ftrace_usage, ftrace_options);
}
--
1.7.11.7

2013-07-30 09:22:18

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 07/17] perf ftrace: Add 'record' sub-command

From: Namhyung Kim <[email protected]>

The ftrace record command is for saving raw ftrace buffer contents
which can be get from per_cpu/cpuX/trace_pipe_raw.

Since ftrace events are generated very frequently so single thread for
recording mostly resulted in buffer overruns. Thus it uses per-cpu
recorder thread to prevent such cases and they save the contents to
their own files.

These per-cpu data files are saved in a directory so that they can be
easily found when needed. I chose the default directory name as
"perf.data.dir" and the first two (i.e. "perf.data") can be changed
with -o option. The structure of the directory looks like:

$ tree perf.data.dir
perf.data.dir/
|-- perf.header
|-- trace-cpu0.buf
|-- trace-cpu1.buf
|-- trace-cpu2.buf
`-- trace-cpu3.buf

In addition to trace-cpuX.buf files, it has perf.header file also.
The perf.header file is compatible with existing perf.data format and
contains usual event information, feature mask and sample data. The
sample data is synthesized to indicate given cpu has a record file.

Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-ftrace.c | 640 ++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 623 insertions(+), 17 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 1bb6d1ff0eb1..46b5c136e4bb 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -12,6 +12,7 @@
#include <unistd.h>
#include <signal.h>
#include <fcntl.h>
+#include <dirent.h>

#include "util/debug.h"
#include "util/parse-options.h"
@@ -19,14 +20,17 @@
#include "util/target.h"
#include "util/thread_map.h"
#include "util/cpumap.h"
+#include "util/trace-event.h"


#define DEFAULT_TRACER "function_graph"
+#define DEFAULT_DIRNAME "perf.data"

struct perf_ftrace {
struct perf_evlist *evlist;
struct perf_target target;
const char *tracer;
+ const char *dirname;
};

static bool done;
@@ -171,40 +175,56 @@ static int reset_tracing_cpu(void)
return 0;
}

-static int do_ftrace_live(struct perf_ftrace *ftrace)
+static int setup_tracing_files(struct perf_ftrace *ftrace)
{
- char *trace_file;
- int trace_fd;
- char buf[4096];
- /* sleep 1ms if no data read */
- struct timespec req = { .tv_nsec = 1000000 };
-
- signal(SIGINT, sig_handler);
- signal(SIGUSR1, sig_handler);
- signal(SIGCHLD, sig_handler);
+ int ret = -1;

- if (reset_tracing_files(ftrace) < 0)
+ if (reset_tracing_files(ftrace) < 0) {
+ pr_err("failed to reset tracing files\n");
goto out;
+ }

/* reset ftrace buffer */
- if (write_tracing_file("trace", "0") < 0)
+ if (write_tracing_file("trace", "0") < 0) {
+ pr_err("failed to reset ftrace buffer\n");
goto out;
+ }

if (set_tracing_pid(ftrace) < 0) {
pr_err("failed to set ftrace pid\n");
- goto out_reset;
+ goto out;
}

if (set_tracing_cpu(ftrace) < 0) {
pr_err("failed to set tracing cpumask\n");
- goto out_reset;
+ goto out;
}

if (write_tracing_file("current_tracer", ftrace->tracer) < 0) {
pr_err("failed to set current_tracer to %s\n", ftrace->tracer);
- goto out_reset;
+ goto out;
}

+ ret = 0;
+out:
+ return ret;
+}
+
+static int do_ftrace_live(struct perf_ftrace *ftrace)
+{
+ char *trace_file;
+ int trace_fd;
+ char buf[4096];
+ /* sleep 1ms if no data read */
+ struct timespec req = { .tv_nsec = 1000000 };
+
+ signal(SIGINT, sig_handler);
+ signal(SIGUSR1, sig_handler);
+ signal(SIGCHLD, sig_handler);
+
+ if (setup_tracing_files(ftrace) < 0)
+ goto out_reset;
+
trace_file = get_tracing_file("trace_pipe");
if (!trace_file) {
pr_err("failed to open trace_pipe\n");
@@ -259,7 +279,523 @@ out_close_fd:
close(trace_fd);
out_reset:
reset_tracing_files(ftrace);
+ return done ? 0 : -1;
+}
+
+static int alloc_ftrace_evsel(struct perf_ftrace *ftrace)
+{
+ if (!strcmp(ftrace->tracer, "function")) {
+ if (perf_evlist__add_newtp(ftrace->evlist, "ftrace",
+ "function", NULL) < 0) {
+ pr_err("failed to allocate ftrace event\n");
+ return -1;
+ }
+ } else if (!strcmp(ftrace->tracer, "function_graph")) {
+ if (perf_evlist__add_newtp(ftrace->evlist, "ftrace",
+ "funcgraph_entry", NULL) ||
+ perf_evlist__add_newtp(ftrace->evlist, "ftrace",
+ "funcgraph_exit", NULL)) {
+ pr_err("failed to allocate ftrace event\n");
+ return -1;
+ }
+ } else {
+ pr_err("Not supported tracer: %s\n", ftrace->tracer);
+ return -1;
+ }
+ return 0;
+}
+
+static void canonicalize_directory_name(const char *dirname)
+{
+ char *suffix = strstr(dirname, ".dir");
+
+ if (suffix) {
+ if (suffix[4] == '\0' || suffix[4] == '/')
+ *suffix = '\0';
+ }
+}
+
+static int remove_directory(const char *pathname)
+{
+ DIR *dir;
+ int ret = 0;
+ struct dirent *dent;
+ char namebuf[PATH_MAX];
+
+ dir = opendir(pathname);
+ if (dir == NULL)
+ return 0;
+
+ while ((dent = readdir(dir)) != NULL && !ret) {
+ struct stat statbuf;
+
+ if (dent->d_name[0] == '.')
+ continue;
+
+ scnprintf(namebuf, sizeof(namebuf), "%s/%s",
+ pathname, dent->d_name);
+
+ ret = stat(namebuf, &statbuf);
+ if (ret < 0) {
+ pr_debug("stat failed\n");
+ break;
+ }
+
+ if (S_ISREG(statbuf.st_mode))
+ ret = unlink(namebuf);
+ else if (S_ISDIR(statbuf.st_mode))
+ ret = remove_directory(namebuf);
+ else {
+ pr_debug("unknown file.\n");
+ ret = -1;
+ }
+ }
+ closedir(dir);
+
+ if (ret < 0)
+ return ret;
+
+ return rmdir(pathname);
+}
+
+static int create_perf_header(struct perf_ftrace *ftrace)
+{
+ int err;
+ char buf[PATH_MAX];
+ struct stat statbuf;
+
+ canonicalize_directory_name(ftrace->dirname);
+
+ scnprintf(buf, sizeof(buf), "%s.dir", ftrace->dirname);
+
+ if (!stat(buf, &statbuf) && S_ISDIR(statbuf.st_mode)) {
+ /* same name already exists - rename to *.old.dir */
+ char *old_name = malloc(strlen(buf) + 5);
+ if (old_name == NULL)
+ return -1;
+
+ scnprintf(old_name, strlen(buf) + 5,
+ "%s.old.dir", ftrace->dirname);
+
+ if (remove_directory(old_name) < 0) {
+ perror("rmdir");
+ return -1;
+ }
+
+ if (rename(buf, old_name) < 0) {
+ perror("rename");
+ free(old_name);
+ return -1;
+ }
+
+ free(old_name);
+ }
+
+ err = mkdir(buf, 0755);
+ if (err < 0) {
+ perror("mkdir");
+ return -1;
+ }
+
+ strcat(buf, "/perf.header");
+
+ err = open(buf, O_RDWR | O_CREAT | O_TRUNC, 0644);
+ return err;
+}
+
+static void sig_dummy_handler(int sig __maybe_unused)
+{
+ while (!done)
+ continue;
+}
+
+enum {
+ RECORD_STATE__ERROR = -1,
+ RECORD_STATE__INIT,
+ RECORD_STATE__READY,
+ RECORD_STATE__DONE,
+};
+
+struct ftrace_record_arg {
+ struct perf_ftrace *ftrace;
+ int cpu;
+ int state;
+ pthread_t id;
+ struct list_head node;
+};
+
+static int recorder_count;
+pthread_cond_t recorder_ready_cond = PTHREAD_COND_INITIALIZER;
+pthread_cond_t recorder_start_cond = PTHREAD_COND_INITIALIZER;
+pthread_mutex_t recorder_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+static void *record_ftrace_raw_buffer(void *arg)
+{
+ struct ftrace_record_arg *fra = arg;
+ char buf[4096];
+ char *trace_file;
+ int trace_fd;
+ int output_fd;
+ off_t byte_written = 0;
+ sigset_t sigmask;
+ /* sleep 1ms if no data read */
+ struct timespec req = { .tv_nsec = 1000000 };
+
+ fra->state = RECORD_STATE__ERROR;
+
+ snprintf(buf, sizeof(buf), "per_cpu/cpu%d/trace_pipe_raw", fra->cpu);
+
+ trace_file = get_tracing_file(buf);
+ if (!trace_file) {
+ pr_err("failed to get trace_pipe_raw\n");
+ goto out;
+ }
+
+ trace_fd = open(trace_file, O_RDONLY);
+
+ put_tracing_file(trace_file);
+
+ if (trace_fd < 0) {
+ pr_err("failed to open trace_pipe_raw\n");
+ goto out;
+ }
+
+ snprintf(buf, sizeof(buf), "%s.dir/trace-cpu%d.buf",
+ fra->ftrace->dirname, fra->cpu);
+
+ output_fd = open(buf, O_WRONLY|O_CREAT|O_TRUNC, 0644);
+ if (output_fd < 0) {
+ pr_err("failed to open output file\n");
+ goto out_close;
+ }
+
+ fra->state = RECORD_STATE__READY;
+
+ /*
+ * block all signals but SIGUSR2.
+ * It'll be used to unblock a recorder to finish.
+ */
+ sigfillset(&sigmask);
+ sigdelset(&sigmask, SIGUSR2);
+ pthread_sigmask(SIG_SETMASK, &sigmask,NULL);
+
+ signal(SIGUSR2, sig_dummy_handler);
+
+ fcntl(trace_fd, F_SETFL, O_NONBLOCK);
+
+ /* Now I'm ready */
+ pthread_mutex_lock(&recorder_mutex);
+ recorder_count++;
+ pthread_cond_signal(&recorder_ready_cond);
+ pthread_cond_wait(&recorder_start_cond, &recorder_mutex);
+ pthread_mutex_unlock(&recorder_mutex);
+
+ pr_debug2("now recording for cpu%d\n", fra->cpu);
+
+ while (!done) {
+ int n = read(trace_fd, buf, sizeof(buf));
+
+ if (n < 0) {
+ if (errno == EINTR || errno == EAGAIN)
+ goto sleep;
+ else
+ break;
+ } else if (n == 0) {
+sleep:
+ clock_nanosleep(CLOCK_MONOTONIC, 0, &req, NULL);
+ } else if (write(output_fd, buf, n) != n)
+ break;
+
+ byte_written += n;
+ }
+
+ /* read remaining buffer contents */
+ while (true) {
+ int n = read(trace_fd, buf, sizeof(buf));
+
+ if (n < 0)
+ goto out_close;
+ if (n == 0)
+ break;
+ if (write(output_fd, buf, n) != n)
+ goto out_close;
+
+ byte_written += n;
+ }
+ fra->state = RECORD_STATE__DONE;
+
+out_close:
+ close(trace_fd);
out:
+ if (fra->state == RECORD_STATE__ERROR) {
+ /*
+ * We need to update recorder_count in this case also
+ * in order to prevent deadlocking in the main thread.
+ */
+ pthread_mutex_lock(&recorder_mutex);
+ recorder_count++;
+ pthread_cond_signal(&recorder_ready_cond);
+ pthread_mutex_unlock(&recorder_mutex);
+ }
+ return fra;
+}
+
+static void *synthesize_raw_data(struct perf_evsel *evsel)
+{
+ void *data = NULL;
+ u32 data_size;
+
+ if (!strcmp(evsel->tp_format->name, "function")) {
+ struct {
+ unsigned short common_type;
+ unsigned char common_flags;
+ unsigned char common_preempt_count;
+ int common_pid;
+ int common_padding;
+
+ unsigned long ip;
+ unsigned long parent_ip;
+ } function_format = {
+ .common_type = evsel->attr.config,
+ };
+
+ data_size = sizeof(function_format);
+
+ data = malloc(data_size + sizeof(u32));
+ if (data == NULL)
+ return NULL;
+
+ memcpy(data, &data_size, sizeof(data_size));
+ memcpy(data + sizeof(data_size), &function_format,
+ sizeof(function_format));
+ } else if (!strcmp(evsel->tp_format->name, "funcgraph_entry")) {
+ struct {
+ unsigned short common_type;
+ unsigned char common_flags;
+ unsigned char common_preempt_count;
+ int common_pid;
+ int common_padding;
+
+ unsigned long func;
+ int depth;
+ } funcgraph_entry_format = {
+ .common_type = evsel->attr.config,
+ };
+
+ data_size = sizeof(funcgraph_entry_format);
+
+ data = malloc(data_size + sizeof(u32));
+ if (data == NULL)
+ return NULL;
+
+ memcpy(data, &data_size, sizeof(data_size));
+ memcpy(data + sizeof(data_size), &funcgraph_entry_format,
+ sizeof(funcgraph_entry_format));
+ }
+ return data;
+}
+
+static int do_ftrace_record(struct perf_ftrace *ftrace)
+{
+ int i, err, feat;
+ int perf_fd;
+ LIST_HEAD(recorders);
+ struct perf_session *session;
+ struct ftrace_record_arg *fra, *tmp;
+
+ signal(SIGINT, sig_handler);
+ signal(SIGUSR1, sig_handler);
+ signal(SIGCHLD, sig_handler);
+
+ if (setup_tracing_files(ftrace) < 0)
+ goto out_reset;
+
+ alloc_ftrace_evsel(ftrace);
+
+ perf_fd = create_perf_header(ftrace);
+ if (perf_fd < 0) {
+ pr_err("failed to create perf directory\n");
+ goto out_reset;
+ }
+
+ /* just use a dummy session for header recording */
+ session = zalloc(sizeof(*session));
+ if (session == NULL) {
+ pr_err("failed to allocate perf session\n");
+ goto out_close;
+ }
+ session->evlist = ftrace->evlist;
+
+ for (feat = HEADER_FIRST_FEATURE; feat < HEADER_LAST_FEATURE; feat++)
+ perf_header__set_feat(&session->header, feat);
+
+ perf_header__clear_feat(&session->header, HEADER_BUILD_ID);
+ perf_header__clear_feat(&session->header, HEADER_BRANCH_STACK);
+ perf_header__clear_feat(&session->header, HEADER_PMU_MAPPINGS);
+ perf_header__clear_feat(&session->header, HEADER_GROUP_DESC);
+
+ err = perf_session__write_header(session, ftrace->evlist,
+ perf_fd, false);
+ if (err < 0) {
+ pr_err("failed to write perf header\n");
+ goto out_session;
+ }
+
+ /*
+ * We record ftrace's per_cpu buffer so that it'd better having
+ * corresponding cpu maps anyway.
+ */
+ if (!perf_target__has_cpu(&ftrace->target)) {
+ struct cpu_map *new_map;
+
+ new_map = cpu_map__new(NULL);
+ if (new_map == NULL) {
+ pr_err("failed to create new cpu map\n");
+ goto out_session;
+ }
+
+ /* replace existing cpu map */
+ cpu_map__delete(ftrace->evlist->cpus);
+ ftrace->evlist->cpus = new_map;
+ }
+
+ for (i = 0; i < cpu_map__nr(ftrace->evlist->cpus); i++) {
+ fra = malloc(sizeof(*fra));
+ if (fra == NULL) {
+ pr_err("not enough memory!\n");
+ goto out_fra;
+ }
+
+ fra->ftrace = ftrace;
+ fra->cpu = ftrace->evlist->cpus->map[i];
+ fra->state = RECORD_STATE__INIT;
+ list_add_tail(&fra->node, &recorders);
+
+ err = pthread_create(&fra->id, NULL,
+ record_ftrace_raw_buffer, fra);
+ if (err < 0) {
+ pr_err("failed to create recorder thread\n");
+ goto out_fra;
+ }
+ }
+
+ /* wait for all recorders ready */
+ pthread_mutex_lock(&recorder_mutex);
+ while (recorder_count != cpu_map__nr(ftrace->evlist->cpus))
+ pthread_cond_wait(&recorder_ready_cond, &recorder_mutex);
+ pthread_mutex_unlock(&recorder_mutex);
+
+ list_for_each_entry(fra, &recorders, node) {
+ if (fra->state != RECORD_STATE__READY) {
+ pr_err("cpu%d: failed to start recorder", fra->cpu);
+ goto out_fra;
+ }
+ }
+
+ if (write_tracing_file("tracing_on", "1") < 0) {
+ pr_err("can't enable tracing\n");
+ goto out_fra;
+ }
+
+ perf_evlist__start_workload(ftrace->evlist);
+
+ pr_debug2("start recording per cpu buffers\n");
+ pthread_mutex_lock(&recorder_mutex);
+ pthread_cond_broadcast(&recorder_start_cond);
+ pthread_mutex_unlock(&recorder_mutex);
+
+ /* wait for signal/finish */
+ pause();
+
+ if (write_tracing_file("tracing_on", "0") < 0) {
+ pr_err("can't disable tracing\n");
+ goto out_fra;
+ }
+
+ /* signal recorders to terminate */
+ list_for_each_entry(fra, &recorders, node) {
+ pr_debug2("killing recorder thread for cpu%d\n", fra->cpu);
+ pthread_kill(fra->id, SIGUSR2);
+ }
+
+ list_for_each_entry(fra, &recorders, node)
+ pthread_join(fra->id, NULL);
+
+ /* synthesize sample data */
+ list_for_each_entry(fra, &recorders, node) {
+ struct perf_evsel *evsel = perf_evlist__first(ftrace->evlist);
+ union perf_event event = {
+ .sample = {
+ .header = {
+ .type = PERF_RECORD_SAMPLE,
+ .misc = PERF_RECORD_MISC_KERNEL,
+ .size = sizeof(event.sample.header) +
+ evsel->sample_size,
+ },
+ },
+ };
+ struct perf_sample sample = {
+ .cpu = fra->cpu,
+ .period = 1,
+ };
+ void *raw_data;
+ u32 raw_size;
+ int orig_size;
+
+ if (fra->state != RECORD_STATE__DONE) {
+ pr_warning("recorder failed for some reason on cpu%d\n",
+ fra->cpu);
+ continue;
+ }
+
+ perf_event__synthesize_sample(&event, evsel->attr.sample_type,
+ &sample, false);
+
+ raw_data = synthesize_raw_data(evsel);
+ if (raw_data == NULL) {
+ pr_err("synthesizing raw sample failed\n");
+ goto out_fra;
+ }
+
+ /*
+ * start of raw data is the size of raw data excluding itself.
+ */
+ raw_size = sizeof(u32) + (*(u32 *) raw_data);
+
+ orig_size = event.sample.header.size;
+ event.sample.header.size += raw_size;
+
+ err = write(perf_fd, &event.sample, orig_size);
+ if (err != orig_size) {
+ pr_err("write error occurred\n");
+ free(raw_data);
+ goto out_fra;
+ }
+
+ err = write(perf_fd, raw_data, raw_size);
+ free(raw_data);
+
+ if (err != (int)raw_size) {
+ pr_err("write error occurred\n");
+ goto out_fra;
+ }
+
+ session->header.data_size += event.sample.header.size;
+ }
+
+ perf_session__write_header(session, ftrace->evlist, perf_fd, true);
+
+out_fra:
+ list_for_each_entry_safe(fra, tmp, &recorders, node) {
+ list_del(&fra->node);
+ free(fra);
+ }
+out_session:
+ free(session);
+out_close:
+ close(perf_fd);
+out_reset:
+ reset_tracing_files(ftrace);
return done ? 0 : -1;
}

@@ -326,6 +862,74 @@ out:
return ret;
}

+static int
+__cmd_ftrace_record(struct perf_ftrace *ftrace, int argc, const char **argv)
+{
+ int ret = -1;
+ const char * const record_usage[] = {
+ "perf ftrace record [<options>] [<command>]",
+ "perf ftrace record [<options>] -- <command> [<options>]",
+ NULL
+ };
+ const struct option record_options[] = {
+ OPT_STRING('t', "tracer", &ftrace->tracer, "tracer",
+ "tracer to use: function_graph or function"),
+ OPT_STRING('p', "pid", &ftrace->target.pid, "pid",
+ "trace on existing process id"),
+ OPT_INCR('v', "verbose", &verbose,
+ "be more verbose"),
+ OPT_BOOLEAN('a', "all-cpus", &ftrace->target.system_wide,
+ "system-wide collection from all CPUs"),
+ OPT_STRING('C', "cpu", &ftrace->target.cpu_list, "cpu",
+ "list of cpus to monitor"),
+ OPT_STRING('o', "output", &ftrace->dirname, "dirname",
+ "input directory name to use (default: perf.data)"),
+ OPT_END()
+ };
+
+ argc = parse_options(argc, argv, record_options, record_usage,
+ PARSE_OPT_STOP_AT_NON_OPTION);
+ if (!argc && perf_target__none(&ftrace->target))
+ usage_with_options(record_usage, record_options);
+
+ ret = perf_target__validate(&ftrace->target);
+ if (ret) {
+ char errbuf[512];
+
+ perf_target__strerror(&ftrace->target, ret, errbuf, 512);
+ pr_err("%s\n", errbuf);
+ return -EINVAL;
+ }
+
+ ftrace->evlist = perf_evlist__new();
+ if (ftrace->evlist == NULL)
+ return -ENOMEM;
+
+ ret = perf_evlist__create_maps(ftrace->evlist, &ftrace->target);
+ if (ret < 0)
+ goto out;
+
+ if (ftrace->tracer == NULL)
+ ftrace->tracer = DEFAULT_TRACER;
+
+ if (ftrace->dirname == NULL)
+ ftrace->dirname = DEFAULT_DIRNAME;
+
+ if (argc && perf_evlist__prepare_workload(ftrace->evlist,
+ &ftrace->target,
+ argv, false, true) < 0)
+ goto out_maps;
+
+ ret = do_ftrace_record(ftrace);
+
+out_maps:
+ perf_evlist__delete_maps(ftrace->evlist);
+out:
+ perf_evlist__delete(ftrace->evlist);
+
+ return ret;
+}
+
int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
{
int ret;
@@ -333,8 +937,8 @@ int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
.target = { .uid = UINT_MAX, },
};
const char * const ftrace_usage[] = {
- "perf ftrace {live} [<options>] [<command>]",
- "perf ftrace {live} [<options>] -- <command> [<options>]",
+ "perf ftrace {live|record} [<options>] [<command>]",
+ "perf ftrace {live|record} [<options>] -- <command> [<options>]",
NULL
};
const struct option ftrace_options[] = {
@@ -353,6 +957,8 @@ int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)

if (strcmp(argv[0], "live") == 0) {
ret = __cmd_ftrace_live(&ftrace, argc, argv);
+ } else if (strncmp(argv[0], "rec", 3) == 0) {
+ ret = __cmd_ftrace_record(&ftrace, argc, argv);
} else {
usage_with_options(ftrace_usage, ftrace_options);
}
--
1.7.11.7

2013-07-30 09:23:05

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 05/17] perf ftrace: Add support for -a and -C option

From: Namhyung Kim <[email protected]>

The -a/--all-cpus and -C/--cpu option is for controlling tracing cpus.
To do that, add and use cpu_map__sprintf() function.

Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-ftrace.c | 69 +++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/cpumap.c | 45 +++++++++++++++++++++++++++++
tools/perf/util/cpumap.h | 1 +
3 files changed, 115 insertions(+)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index bd415f2b1cde..642aa49c66d7 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -18,6 +18,7 @@
#include "util/evlist.h"
#include "util/target.h"
#include "util/thread_map.h"
+#include "util/cpumap.h"


#define DEFAULT_TRACER "function_graph"
@@ -80,6 +81,8 @@ static int append_tracing_file(const char *name, const char *val)
return __write_tracing_file(name, val, true);
}

+static int reset_tracing_cpu(void);
+
static int reset_tracing_files(struct perf_ftrace *ftrace __maybe_unused)
{
if (write_tracing_file("tracing_on", "0") < 0)
@@ -91,6 +94,9 @@ static int reset_tracing_files(struct perf_ftrace *ftrace __maybe_unused)
if (write_tracing_file("set_ftrace_pid", " ") < 0)
return -1;

+ if (reset_tracing_cpu() < 0)
+ return -1;
+
return 0;
}

@@ -111,6 +117,60 @@ static int set_tracing_pid(struct perf_ftrace *ftrace)
return 0;
}

+static int set_tracing_cpu(struct perf_ftrace *ftrace)
+{
+ char *cpumask;
+ size_t mask_size;
+ int ret;
+ int last_cpu;
+ struct cpu_map *cpumap = ftrace->evlist->cpus;
+
+ if (!perf_target__has_cpu(&ftrace->target))
+ return 0;
+
+ last_cpu = cpumap->map[cpumap->nr - 1];
+ mask_size = (last_cpu + 3) / 4 + 1;
+ mask_size += last_cpu / 32; /* ',' is needed for every 32th cpus */
+
+ cpumask = malloc(mask_size);
+ if (cpumask == NULL) {
+ pr_debug("failed to allocate cpu mask\n");
+ return -1;
+ }
+
+ cpu_map__sprintf(cpumap, cpumask);
+
+ ret = write_tracing_file("tracing_cpumask", cpumask);
+
+ free(cpumask);
+ return ret;
+}
+
+static int reset_tracing_cpu(void)
+{
+ char *cpumask;
+ size_t mask_size;
+ int last_cpu;
+ struct cpu_map *cpumap = cpu_map__new(NULL);
+
+ last_cpu = cpumap->map[cpumap->nr - 1];
+ mask_size = (last_cpu + 3) / 4 + 1;
+ mask_size += last_cpu / 32; /* ',' is needed for every 32th cpus */
+
+ cpumask = malloc(mask_size);
+ if (cpumask == NULL) {
+ pr_debug("failed to allocate cpu mask\n");
+ return -1;
+ }
+
+ cpu_map__sprintf(cpumap, cpumask);
+
+ write_tracing_file("tracing_cpumask", cpumask);
+
+ free(cpumask);
+ return 0;
+}
+
static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)
{
char *trace_file;
@@ -147,6 +207,11 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)
goto out_reset;
}

+ if (set_tracing_cpu(ftrace) < 0) {
+ pr_err("failed to set tracing cpumask\n");
+ goto out_reset;
+ }
+
if (write_tracing_file("current_tracer", ftrace->tracer) < 0) {
pr_err("failed to set current_tracer to %s\n", ftrace->tracer);
goto out_reset;
@@ -227,6 +292,10 @@ int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
"trace on existing process id"),
OPT_INCR('v', "verbose", &verbose,
"be more verbose"),
+ OPT_BOOLEAN('a', "all-cpus", &ftrace.target.system_wide,
+ "system-wide collection from all CPUs"),
+ OPT_STRING('C', "cpu", &ftrace.target.cpu_list, "cpu",
+ "list of cpus to monitor"),
OPT_END()
};

diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
index beb8cf9f9976..7933839915ea 100644
--- a/tools/perf/util/cpumap.c
+++ b/tools/perf/util/cpumap.c
@@ -187,6 +187,51 @@ size_t cpu_map__fprintf(struct cpu_map *map, FILE *fp)
return printed + fprintf(fp, "\n");
}

+static char hex_char(char val)
+{
+ if (0 <= val && val <= 9)
+ return val + '0';
+ if (10 <= val && val < 16)
+ return val - 10 + 'a';
+ return '?';
+}
+
+size_t cpu_map__sprintf(struct cpu_map *map, char *buf)
+{
+ int i, cpu;
+ char *ptr = buf;
+ unsigned char *bitmap;
+ int last_cpu = map->map[map->nr - 1];
+
+ bitmap = zalloc((last_cpu + 7) / 8);
+ if (bitmap == NULL) {
+ buf[0] = '\0';
+ return 0;
+ }
+
+ for (i = 0; i < map->nr; i++) {
+ cpu = map->map[i];
+ bitmap[cpu / 8] |= 1 << (cpu % 8);
+ }
+
+ for (cpu = last_cpu / 4 * 4; cpu >= 0; cpu -= 4) {
+ unsigned char bits = bitmap[cpu / 8];
+
+ if (cpu % 8)
+ bits >>= 4;
+ else
+ bits &= 0xf;
+
+ *ptr++ = hex_char(bits);
+ if ((cpu % 32) == 0 && cpu > 0)
+ *ptr++ = ',';
+ }
+ *ptr++ = '\0';
+
+ free(bitmap);
+ return ptr - buf;
+}
+
struct cpu_map *cpu_map__dummy_new(void)
{
struct cpu_map *cpus = malloc(sizeof(*cpus) + sizeof(int));
diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
index b123bb9d6f55..a9a7f19f89d7 100644
--- a/tools/perf/util/cpumap.h
+++ b/tools/perf/util/cpumap.h
@@ -14,6 +14,7 @@ struct cpu_map *cpu_map__dummy_new(void);
void cpu_map__delete(struct cpu_map *map);
struct cpu_map *cpu_map__read(FILE *file);
size_t cpu_map__fprintf(struct cpu_map *map, FILE *fp);
+size_t cpu_map__sprintf(struct cpu_map *map, char *buf);
int cpu_map__get_socket(struct cpu_map *map, int idx);
int cpu_map__get_core(struct cpu_map *map, int idx);
int cpu_map__build_socket_map(struct cpu_map *cpus, struct cpu_map **sockp);
--
1.7.11.7

2013-07-30 09:23:04

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 03/17] perf tools: Introduce new 'ftrace' tool

From: Namhyung Kim <[email protected]>

The ftrace command is a simple wrapper of kernel's ftrace
functionality. It only supports single thread tracing currently and
just reads trace_pipe in text and then write it to stdout.

Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/Makefile | 1 +
tools/perf/builtin-ftrace.c | 228 ++++++++++++++++++++++++++++++++++++++++++++
tools/perf/builtin.h | 1 +
tools/perf/command-list.txt | 1 +
tools/perf/perf.c | 1 +
5 files changed, 232 insertions(+)
create mode 100644 tools/perf/builtin-ftrace.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index bfd12d02a304..bad639cedfd3 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -423,6 +423,7 @@ BUILTIN_OBJS += $(OUTPUT)builtin-kmem.o
BUILTIN_OBJS += $(OUTPUT)builtin-lock.o
BUILTIN_OBJS += $(OUTPUT)builtin-kvm.o
BUILTIN_OBJS += $(OUTPUT)builtin-inject.o
+BUILTIN_OBJS += $(OUTPUT)builtin-ftrace.o
BUILTIN_OBJS += $(OUTPUT)tests/builtin-test.o
BUILTIN_OBJS += $(OUTPUT)builtin-mem.o

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
new file mode 100644
index 000000000000..6f3dd02c5b9a
--- /dev/null
+++ b/tools/perf/builtin-ftrace.c
@@ -0,0 +1,228 @@
+/*
+ * builtin-ftrace.c
+ *
+ * Copyright (c) 2013 LG Electronics, Namhyung Kim <[email protected]>
+ *
+ * Released under the GPL v2.
+ */
+
+#include "builtin.h"
+#include "perf.h"
+
+#include <unistd.h>
+#include <signal.h>
+
+#include "util/debug.h"
+#include "util/parse-options.h"
+#include "util/evlist.h"
+#include "util/target.h"
+#include "util/thread_map.h"
+
+
+#define DEFAULT_TRACER "function_graph"
+
+struct perf_ftrace {
+ struct perf_evlist *evlist;
+ struct perf_target target;
+ const char *tracer;
+};
+
+static bool done;
+
+static void sig_handler(int sig __maybe_unused)
+{
+ done = true;
+}
+
+static int write_tracing_file(const char *name, const char *val)
+{
+ char *file;
+ int fd, ret = -1;
+ ssize_t size = strlen(val);
+
+ file = get_tracing_file(name);
+ if (!file) {
+ pr_debug("cannot get tracing file: %s\n", name);
+ return -1;
+ }
+
+ fd = open(file, O_WRONLY);
+ if (fd < 0) {
+ pr_debug("cannot open tracing file: %s\n", name);
+ goto out;
+ }
+
+ if (write(fd, val, size) == size)
+ ret = 0;
+ else
+ pr_debug("write '%s' to tracing/%s failed\n", val, name);
+
+ close(fd);
+out:
+ put_tracing_file(file);
+ return ret;
+}
+
+static int reset_tracing_files(struct perf_ftrace *ftrace __maybe_unused)
+{
+ if (write_tracing_file("tracing_on", "0") < 0)
+ return -1;
+
+ if (write_tracing_file("current_tracer", "nop") < 0)
+ return -1;
+
+ if (write_tracing_file("set_ftrace_pid", " ") < 0)
+ return -1;
+
+ return 0;
+}
+
+static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)
+{
+ char *trace_file;
+ int trace_fd;
+ char *trace_pid;
+ char buf[4096];
+ struct pollfd pollfd = {
+ .events = POLLIN,
+ };
+
+ if (geteuid() != 0) {
+ pr_err("ftrace only works for root!\n");
+ return -1;
+ }
+
+ if (argc < 1)
+ return -1;
+
+ signal(SIGINT, sig_handler);
+ signal(SIGUSR1, sig_handler);
+ signal(SIGCHLD, sig_handler);
+
+ reset_tracing_files(ftrace);
+
+ /* reset ftrace buffer */
+ if (write_tracing_file("trace", "0") < 0)
+ goto out;
+
+ if (perf_evlist__prepare_workload(ftrace->evlist, &ftrace->target,
+ argv, false, true) < 0)
+ goto out;
+
+ if (write_tracing_file("current_tracer", ftrace->tracer) < 0) {
+ pr_err("failed to set current_tracer to %s\n", ftrace->tracer);
+ goto out;
+ }
+
+ if (asprintf(&trace_pid, "%d", ftrace->evlist->threads->map[0]) < 0) {
+ pr_err("failed to allocate pid string\n");
+ goto out;
+ }
+
+ if (write_tracing_file("set_ftrace_pid", trace_pid) < 0) {
+ pr_err("failed to set pid: %s\n", trace_pid);
+ goto out_free_pid;
+ }
+
+ trace_file = get_tracing_file("trace_pipe");
+ if (!trace_file) {
+ pr_err("failed to open trace_pipe\n");
+ goto out_free_pid;
+ }
+
+ trace_fd = open(trace_file, O_RDONLY);
+
+ put_tracing_file(trace_file);
+
+ if (trace_fd < 0) {
+ pr_err("failed to open trace_pipe\n");
+ goto out_free_pid;
+ }
+
+ fcntl(trace_fd, F_SETFL, O_NONBLOCK);
+ pollfd.fd = trace_fd;
+
+ if (write_tracing_file("tracing_on", "1") < 0) {
+ pr_err("can't enable tracing\n");
+ goto out_close_fd;
+ }
+
+ perf_evlist__start_workload(ftrace->evlist);
+
+ while (!done) {
+ if (poll(&pollfd, 1, -1) < 0)
+ break;
+
+ if (pollfd.revents & POLLIN) {
+ int n = read(trace_fd, buf, sizeof(buf));
+ if (n < 0)
+ break;
+ if (fwrite(buf, n, 1, stdout) != 1)
+ break;
+ }
+ }
+
+ write_tracing_file("tracing_on", "0");
+
+ /* read remaining buffer contents */
+ while (true) {
+ int n = read(trace_fd, buf, sizeof(buf));
+ if (n <= 0)
+ break;
+ if (fwrite(buf, n, 1, stdout) != 1)
+ break;
+ }
+
+out_close_fd:
+ close(trace_fd);
+out_free_pid:
+ free(trace_pid);
+out:
+ reset_tracing_files(ftrace);
+
+ return done ? 0 : -1;
+}
+
+int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
+{
+ int ret;
+ struct perf_ftrace ftrace = {
+ .target = { .uid = UINT_MAX, },
+ };
+ const char * const ftrace_usage[] = {
+ "perf ftrace [<options>] [<command>]",
+ "perf ftrace [<options>] -- <command> [<options>]",
+ NULL
+ };
+ const struct option ftrace_options[] = {
+ OPT_STRING('t', "tracer", &ftrace.tracer, "tracer",
+ "tracer to use: function_graph or function"),
+ OPT_INCR('v', "verbose", &verbose,
+ "be more verbose"),
+ OPT_END()
+ };
+
+ argc = parse_options(argc, argv, ftrace_options, ftrace_usage,
+ PARSE_OPT_STOP_AT_NON_OPTION);
+ if (!argc)
+ usage_with_options(ftrace_usage, ftrace_options);
+
+ ftrace.evlist = perf_evlist__new();
+ if (ftrace.evlist == NULL)
+ return -ENOMEM;
+
+ ret = perf_evlist__create_maps(ftrace.evlist, &ftrace.target);
+ if (ret < 0)
+ goto out_delete_evlist;
+
+ if (ftrace.tracer == NULL)
+ ftrace.tracer = DEFAULT_TRACER;
+
+ ret = __cmd_ftrace(&ftrace, argc, argv);
+
+ perf_evlist__delete_maps(ftrace.evlist);
+out_delete_evlist:
+ perf_evlist__delete(ftrace.evlist);
+
+ return ret;
+}
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index b210d62907e4..55da75c4e990 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -37,6 +37,7 @@ extern int cmd_test(int argc, const char **argv, const char *prefix);
extern int cmd_trace(int argc, const char **argv, const char *prefix);
extern int cmd_inject(int argc, const char **argv, const char *prefix);
extern int cmd_mem(int argc, const char **argv, const char *prefix);
+extern int cmd_ftrace(int argc, const char **argv, const char *prefix);

extern int find_scripts(char **scripts_array, char **scripts_path_array);
#endif
diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt
index 0906fc401c52..c24d76e61bc1 100644
--- a/tools/perf/command-list.txt
+++ b/tools/perf/command-list.txt
@@ -9,6 +9,7 @@ perf-buildid-cache mainporcelain common
perf-buildid-list mainporcelain common
perf-diff mainporcelain common
perf-evlist mainporcelain common
+perf-ftrace mainporcelain common
perf-inject mainporcelain common
perf-kmem mainporcelain common
perf-kvm mainporcelain common
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index 85e1aed95204..8966f4b718b8 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -61,6 +61,7 @@ static struct cmd_struct commands[] = {
#endif
{ "inject", cmd_inject, 0 },
{ "mem", cmd_mem, 0 },
+ { "ftrace", cmd_ftrace, 0 },
};

struct pager_config {
--
1.7.11.7

2013-07-30 09:23:42

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 01/17] perf util: Save pid-cmdline mapping into tracing header

From: Namhyung Kim <[email protected]>

Current trace info data lacks the saved cmdline mapping which is
needed for pevent to find out the comm of a task. Add this and bump
up the version number so that perf can determine its presence when
reading.

This is mostly corresponding to trace.dat file version 6, but still
lacks 4 byte of number of cpus, and 10 bytes of type string - and I
think we don't need those anyway.

Cc: Steven Rostedt <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/trace-event-info.c | 33 ++++++++++++++++++++++++++++++++-
tools/perf/util/trace-event-parse.c | 17 +++++++++++++++++
tools/perf/util/trace-event-read.c | 36 ++++++++++++++++++++++++++++++++++--
tools/perf/util/trace-event.h | 1 +
4 files changed, 84 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/trace-event-info.c b/tools/perf/util/trace-event-info.c
index f3c9e551bd35..48678ca4c4d6 100644
--- a/tools/perf/util/trace-event-info.c
+++ b/tools/perf/util/trace-event-info.c
@@ -41,7 +41,7 @@
#include <lk/debugfs.h>
#include "evsel.h"

-#define VERSION "0.5"
+#define VERSION "0.6"

static int output_fd;

@@ -390,6 +390,34 @@ out:
return err;
}

+static int record_saved_cmdline(void)
+{
+ unsigned int size;
+ char *path;
+ struct stat st;
+ int ret, err = 0;
+
+ path = get_tracing_file("saved_cmdlines");
+ if (!path) {
+ pr_debug("can't get tracing/saved_cmdline");
+ return -ENOMEM;
+ }
+
+ ret = stat(path, &st);
+ if (ret < 0) {
+ /* not found */
+ size = 0;
+ if (write(output_fd, &size, 8) != 8)
+ err = -EIO;
+ goto out;
+ }
+ err = record_file(path, 8);
+
+out:
+ put_tracing_file(path);
+ return err;
+}
+
static void
put_tracepoints_path(struct tracepoint_path *tps)
{
@@ -550,6 +578,9 @@ struct tracing_data *tracing_data_get(struct list_head *pattrs,
if (err)
goto out;
err = record_ftrace_printk();
+ if (err)
+ goto out;
+ err = record_saved_cmdline();

out:
/*
diff --git a/tools/perf/util/trace-event-parse.c b/tools/perf/util/trace-event-parse.c
index fe7a27d67d2b..ef09e4720e04 100644
--- a/tools/perf/util/trace-event-parse.c
+++ b/tools/perf/util/trace-event-parse.c
@@ -232,6 +232,23 @@ void parse_ftrace_printk(struct pevent *pevent,
}
}

+void parse_saved_cmdline(struct pevent *pevent,
+ char *file, unsigned int size __maybe_unused)
+{
+ char *comm;
+ char *line;
+ char *next = NULL;
+ int pid;
+
+ line = strtok_r(file, "\n", &next);
+ while (line) {
+ sscanf(line, "%d %ms", &pid, &comm);
+ pevent_register_comm(pevent, comm, pid);
+ free(comm);
+ line = strtok_r(NULL, "\n", &next);
+ }
+}
+
int parse_ftrace_file(struct pevent *pevent, char *buf, unsigned long size)
{
return pevent_parse_event(pevent, buf, size, "ftrace");
diff --git a/tools/perf/util/trace-event-read.c b/tools/perf/util/trace-event-read.c
index f2112270c663..e084e5e654b6 100644
--- a/tools/perf/util/trace-event-read.c
+++ b/tools/perf/util/trace-event-read.c
@@ -343,6 +343,31 @@ static int read_event_files(struct pevent *pevent)
return 0;
}

+static int read_saved_cmdline(struct pevent *pevent)
+{
+ unsigned long long size;
+ char *buf;
+
+ /* it can have 0 size */
+ size = read8(pevent);
+ if (!size)
+ return 0;
+
+ buf = malloc(size + 1);
+ if (buf == NULL)
+ return -1;
+
+ if (do_read(buf, size) < 0) {
+ free(buf);
+ return -1;
+ }
+
+ parse_saved_cmdline(pevent, buf, size);
+
+ free(buf);
+ return 0;
+}
+
ssize_t trace_report(int fd, struct pevent **ppevent, bool __repipe)
{
char buf[BUFSIZ];
@@ -383,10 +408,11 @@ ssize_t trace_report(int fd, struct pevent **ppevent, bool __repipe)
return -1;
if (show_version)
printf("version = %s\n", version);
- free(version);

- if (do_read(buf, 1) < 0)
+ if (do_read(buf, 1) < 0) {
+ free(version);
return -1;
+ }
file_bigendian = buf[0];
host_bigendian = bigendian();

@@ -422,6 +448,11 @@ ssize_t trace_report(int fd, struct pevent **ppevent, bool __repipe)
err = read_ftrace_printk(pevent);
if (err)
goto out;
+ if (!strcmp(version, "0.6")) {
+ err = read_saved_cmdline(pevent);
+ if (err)
+ goto out;
+ }

size = trace_data_size;
repipe = false;
@@ -438,5 +469,6 @@ ssize_t trace_report(int fd, struct pevent **ppevent, bool __repipe)
out:
if (pevent)
pevent_free(pevent);
+ free(version);
return size;
}
diff --git a/tools/perf/util/trace-event.h b/tools/perf/util/trace-event.h
index fafe1a40444a..52fc78d4b40c 100644
--- a/tools/perf/util/trace-event.h
+++ b/tools/perf/util/trace-event.h
@@ -31,6 +31,7 @@ void *raw_field_ptr(struct event_format *event, const char *name, void *data);

void parse_proc_kallsyms(struct pevent *pevent, char *file, unsigned int size);
void parse_ftrace_printk(struct pevent *pevent, char *file, unsigned int size);
+void parse_saved_cmdline(struct pevent *pevent, char *file, unsigned int size);

ssize_t trace_report(int fd, struct pevent **pevent, bool repipe);

--
1.7.11.7

2013-07-30 09:23:41

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 04/17] perf ftrace: Add support for --pid option

From: Namhyung Kim <[email protected]>

The -p (--pid) option enables to trace existing process by its pid.

Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-ftrace.c | 89 ++++++++++++++++++++++++++++++++-------------
1 file changed, 63 insertions(+), 26 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 6f3dd02c5b9a..bd415f2b1cde 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -11,6 +11,7 @@

#include <unistd.h>
#include <signal.h>
+#include <fcntl.h>

#include "util/debug.h"
#include "util/parse-options.h"
@@ -34,11 +35,12 @@ static void sig_handler(int sig __maybe_unused)
done = true;
}

-static int write_tracing_file(const char *name, const char *val)
+static int __write_tracing_file(const char *name, const char *val, bool append)
{
char *file;
int fd, ret = -1;
ssize_t size = strlen(val);
+ int flags = O_WRONLY;

file = get_tracing_file(name);
if (!file) {
@@ -46,7 +48,12 @@ static int write_tracing_file(const char *name, const char *val)
return -1;
}

- fd = open(file, O_WRONLY);
+ if (append)
+ flags |= O_APPEND;
+ else
+ flags |= O_TRUNC;
+
+ fd = open(file, flags);
if (fd < 0) {
pr_debug("cannot open tracing file: %s\n", name);
goto out;
@@ -63,6 +70,16 @@ out:
return ret;
}

+static int write_tracing_file(const char *name, const char *val)
+{
+ return __write_tracing_file(name, val, false);
+}
+
+static int append_tracing_file(const char *name, const char *val)
+{
+ return __write_tracing_file(name, val, true);
+}
+
static int reset_tracing_files(struct perf_ftrace *ftrace __maybe_unused)
{
if (write_tracing_file("tracing_on", "0") < 0)
@@ -77,11 +94,27 @@ static int reset_tracing_files(struct perf_ftrace *ftrace __maybe_unused)
return 0;
}

+static int set_tracing_pid(struct perf_ftrace *ftrace)
+{
+ int i;
+ char buf[16];
+
+ if (perf_target__has_cpu(&ftrace->target))
+ return 0;
+
+ for (i = 0; i < thread_map__nr(ftrace->evlist->threads); i++) {
+ scnprintf(buf, sizeof(buf), "%d",
+ ftrace->evlist->threads->map[i]);
+ if (append_tracing_file("set_ftrace_pid", buf) < 0)
+ return -1;
+ }
+ return 0;
+}
+
static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)
{
char *trace_file;
int trace_fd;
- char *trace_pid;
char buf[4096];
struct pollfd pollfd = {
.events = POLLIN,
@@ -92,42 +125,37 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)
return -1;
}

- if (argc < 1)
- return -1;
-
signal(SIGINT, sig_handler);
signal(SIGUSR1, sig_handler);
signal(SIGCHLD, sig_handler);

- reset_tracing_files(ftrace);
+ if (reset_tracing_files(ftrace) < 0)
+ goto out;

/* reset ftrace buffer */
if (write_tracing_file("trace", "0") < 0)
goto out;

- if (perf_evlist__prepare_workload(ftrace->evlist, &ftrace->target,
- argv, false, true) < 0)
- goto out;
-
- if (write_tracing_file("current_tracer", ftrace->tracer) < 0) {
- pr_err("failed to set current_tracer to %s\n", ftrace->tracer);
+ if (argc && perf_evlist__prepare_workload(ftrace->evlist,
+ &ftrace->target,
+ argv, false, true) < 0) {
goto out;
}

- if (asprintf(&trace_pid, "%d", ftrace->evlist->threads->map[0]) < 0) {
- pr_err("failed to allocate pid string\n");
- goto out;
+ if (set_tracing_pid(ftrace) < 0) {
+ pr_err("failed to set ftrace pid\n");
+ goto out_reset;
}

- if (write_tracing_file("set_ftrace_pid", trace_pid) < 0) {
- pr_err("failed to set pid: %s\n", trace_pid);
- goto out_free_pid;
+ if (write_tracing_file("current_tracer", ftrace->tracer) < 0) {
+ pr_err("failed to set current_tracer to %s\n", ftrace->tracer);
+ goto out_reset;
}

trace_file = get_tracing_file("trace_pipe");
if (!trace_file) {
pr_err("failed to open trace_pipe\n");
- goto out_free_pid;
+ goto out_reset;
}

trace_fd = open(trace_file, O_RDONLY);
@@ -136,7 +164,7 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)

if (trace_fd < 0) {
pr_err("failed to open trace_pipe\n");
- goto out_free_pid;
+ goto out_reset;
}

fcntl(trace_fd, F_SETFL, O_NONBLOCK);
@@ -175,11 +203,9 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)

out_close_fd:
close(trace_fd);
-out_free_pid:
- free(trace_pid);
-out:
+out_reset:
reset_tracing_files(ftrace);
-
+out:
return done ? 0 : -1;
}

@@ -197,6 +223,8 @@ int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
const struct option ftrace_options[] = {
OPT_STRING('t', "tracer", &ftrace.tracer, "tracer",
"tracer to use: function_graph or function"),
+ OPT_STRING('p', "pid", &ftrace.target.pid, "pid",
+ "trace on existing process id"),
OPT_INCR('v', "verbose", &verbose,
"be more verbose"),
OPT_END()
@@ -204,9 +232,18 @@ int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)

argc = parse_options(argc, argv, ftrace_options, ftrace_usage,
PARSE_OPT_STOP_AT_NON_OPTION);
- if (!argc)
+ if (!argc && perf_target__none(&ftrace.target))
usage_with_options(ftrace_usage, ftrace_options);

+ ret = perf_target__validate(&ftrace.target);
+ if (ret) {
+ char errbuf[512];
+
+ perf_target__strerror(&ftrace.target, ret, errbuf, 512);
+ pr_err("%s\n", errbuf);
+ return -EINVAL;
+ }
+
ftrace.evlist = perf_evlist__new();
if (ftrace.evlist == NULL)
return -ENOMEM;
--
1.7.11.7

2013-07-30 13:28:52

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 01/17] perf util: Save pid-cmdline mapping into tracing header

Em Tue, Jul 30, 2013 at 06:18:58PM +0900, Namhyung Kim escreveu:
> From: Namhyung Kim <[email protected]>
>
> Current trace info data lacks the saved cmdline mapping which is
> needed for pevent to find out the comm of a task. Add this and bump
> up the version number so that perf can determine its presence when
> reading.

Can't we get this from the PERF_RECORD_COMM synthesized at the beginning
of the session + the kernel generated ones for new threads?

We would just call machine__find_thread() and use thread->comm, etc, no?

- Arnaldo

2013-07-30 13:33:54

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 02/17] perf util: Add more debug message on failure path

Em Tue, Jul 30, 2013 at 06:18:59PM +0900, Namhyung Kim escreveu:
> From: Namhyung Kim <[email protected]>
>
> It's helpful for debugging on tracing features.

I wonder if we shouldn't start using 'perf probe to insert those
things... Having some shell script with a series of probe insertions...
What do you think? At least we would be eating our own dog food 8-)

Or at least have something that shows the source file name and line
number, so that things like:

static int read_event_file(struct pevent *pevent, char *sys,
unsigned long long size)
{
+ int ret;
char *buf;

buf = malloc(size);
- if (buf == NULL)
+ if (buf == NULL) {
+ pr_debug("memory allocation failure\n");
return -1;
+ }

actually help us 8-)

- Arnaldo

> Signed-off-by: Namhyung Kim <[email protected]>
> ---
> tools/perf/util/header.c | 4 ++-
> tools/perf/util/trace-event-read.c | 53 ++++++++++++++++++++++++++------------
> 2 files changed, 39 insertions(+), 18 deletions(-)
>
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index f558f83769af..11b08f24ba42 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -2700,8 +2700,10 @@ static int perf_evsel__prepare_tracepoint_event(struct perf_evsel *evsel,
> }
>
> event = pevent_find_event(pevent, evsel->attr.config);
> - if (event == NULL)
> + if (event == NULL) {
> + pr_debug("cannot find event format for %d\n", (int)evsel->attr.config);
> return -1;
> + }
>
> if (!evsel->name) {
> snprintf(bf, sizeof(bf), "%s:%s", event->system, event->name);
> diff --git a/tools/perf/util/trace-event-read.c b/tools/perf/util/trace-event-read.c
> index e084e5e654b6..0e3b3f527320 100644
> --- a/tools/perf/util/trace-event-read.c
> +++ b/tools/perf/util/trace-event-read.c
> @@ -262,39 +262,53 @@ static int read_header_files(struct pevent *pevent)
>
> static int read_ftrace_file(struct pevent *pevent, unsigned long long size)
> {
> + int ret;
> char *buf;
>
> buf = malloc(size);
> - if (buf == NULL)
> + if (buf == NULL) {
> + pr_debug("memory allocation failure\n");
> return -1;
> + }
>
> - if (do_read(buf, size) < 0) {
> - free(buf);
> - return -1;
> + ret = do_read(buf, size);
> + if (ret < 0) {
> + pr_debug("error reading ftrace file.\n");
> + goto out;
> }
>
> - parse_ftrace_file(pevent, buf, size);
> + ret = parse_ftrace_file(pevent, buf, size);
> + if (ret < 0)
> + pr_debug("error parsing ftrace file.\n");
> +out:
> free(buf);
> - return 0;
> + return ret;
> }
>
> static int read_event_file(struct pevent *pevent, char *sys,
> unsigned long long size)
> {
> + int ret;
> char *buf;
>
> buf = malloc(size);
> - if (buf == NULL)
> + if (buf == NULL) {
> + pr_debug("memory allocation failure\n");
> return -1;
> + }
>
> - if (do_read(buf, size) < 0) {
> + ret = do_read(buf, size);
> + if (ret < 0) {
> free(buf);
> - return -1;
> + goto out;
> }
>
> - parse_event_file(pevent, buf, size, sys);
> + ret = parse_event_file(pevent, buf, size, sys);
> + if (ret < 0)
> + pr_debug("error parsing event file.\n");
> +out:
> free(buf);
> - return 0;
> + return ret;
> }
>
> static int read_ftrace_files(struct pevent *pevent)
> @@ -347,6 +361,7 @@ static int read_saved_cmdline(struct pevent *pevent)
> {
> unsigned long long size;
> char *buf;
> + int ret;
>
> /* it can have 0 size */
> size = read8(pevent);
> @@ -354,18 +369,22 @@ static int read_saved_cmdline(struct pevent *pevent)
> return 0;
>
> buf = malloc(size + 1);
> - if (buf == NULL)
> + if (buf == NULL) {
> + pr_debug("memory allocation failure\n");
> return -1;
> + }
>
> - if (do_read(buf, size) < 0) {
> - free(buf);
> - return -1;
> + ret = do_read(buf, size);
> + if (ret < 0) {
> + pr_debug("error reading saved cmdlines\n");
> + goto out;
> }
>
> parse_saved_cmdline(pevent, buf, size);
> -
> + ret = 0;
> +out:
> free(buf);
> - return 0;
> + return ret;
> }
>
> ssize_t trace_report(int fd, struct pevent **ppevent, bool __repipe)
> --
> 1.7.11.7

2013-07-31 00:51:58

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 01/17] perf util: Save pid-cmdline mapping into tracing header

Hi Arnaldo,

2013-07-30 오후 10:28, Arnaldo Carvalho de Melo 쓴 글:
> Em Tue, Jul 30, 2013 at 06:18:58PM +0900, Namhyung Kim escreveu:
>> From: Namhyung Kim <[email protected]>
>>
>> Current trace info data lacks the saved cmdline mapping which is
>> needed for pevent to find out the comm of a task. Add this and bump
>> up the version number so that perf can determine its presence when
>> reading.
>
> Can't we get this from the PERF_RECORD_COMM synthesized at the beginning
> of the session + the kernel generated ones for new threads?
>
> We would just call machine__find_thread() and use thread->comm, etc, no?

Yes, I think it's doable. However it needs an additional perf event
only to generate the COMM events. So I decided not to do it and reused
saved cmdline info from ftrace. But IIUC it also has a problem of
possible information loss due to the size of table.

Thanks,
Namhyung

2013-07-31 14:07:56

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 01/17] perf util: Save pid-cmdline mapping into tracing header

Em Wed, Jul 31, 2013 at 09:51:54AM +0900, Namhyung Kim escreveu:
> 2013-07-30 오후 10:28, Arnaldo Carvalho de Melo 쓴 글:
> >Em Tue, Jul 30, 2013 at 06:18:58PM +0900, Namhyung Kim escreveu:
> >>Current trace info data lacks the saved cmdline mapping which is
> >>needed for pevent to find out the comm of a task. Add this and bump
> >>up the version number so that perf can determine its presence when
> >>reading.

> >Can't we get this from the PERF_RECORD_COMM synthesized at the beginning
> >of the session + the kernel generated ones for new threads?

> >We would just call machine__find_thread() and use thread->comm, etc, no?

> Yes, I think it's doable. However it needs an additional perf event
> only to generate the COMM events. So I decided not to do it and

I don't think this is a problem, its in line with Ingo's suggestion of a
new perf ioctl to ask the kernel to generate PERF_RECORD_MMAP events for
existing threads.

- Arnaldo

2013-08-01 12:03:18

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCHSET 00/17] perf tools: Introduce new 'ftrace' command (v4)

On Tue, Jul 30, 2013 at 06:18:57PM +0900, Namhyung Kim wrote:
> This patchset implements a front-end tool for kernel's ftrace. It
> uses function_graph tracer by default and normal function tracer is
> also supported. (Of course you need to enable those tracers in your
> kernel first.)
>
> v4 changes:
> * use pid instead of tid for -p option (David)
> * not to poll() for reading ftrace pipes (Jiri)
> * rename to create_perf_header() (Jiri)
> * canonicalize directory name (Jiri)
> * show more info when -D option was given (Jiri)
> * update documentation (Jiri)
> * and few more bug fixes
>
> I pushed it out to 'perf/ftrace-v4' branch on my tree at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
>
>
> Any comments are welcome, thanks,
> Namhyung

hi,
I've got following error when compiling on Fedora 16:

CC builtin-ftrace.o
builtin-ftrace.c: In function ‘set_tracing_clock’:
builtin-ftrace.c:197:14: error: declaration of ‘clock’ shadows a global declaration [-Werror=shadow]
builtin-ftrace.c: In function ‘canonicalize_directory_name’:
builtin-ftrace.c:350:53: error: declaration of ‘dirname’ shadows a global declaration [-Werror=shadow]
cc1: all warnings being treated as errors

Fedora 17 seems to be ok.. strange ;-)

jirka

2013-08-02 09:17:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCHSET 00/17] perf tools: Introduce new 'ftrace' command (v4)


* Jiri Olsa <[email protected]> wrote:

> On Tue, Jul 30, 2013 at 06:18:57PM +0900, Namhyung Kim wrote:
> > This patchset implements a front-end tool for kernel's ftrace. It
> > uses function_graph tracer by default and normal function tracer is
> > also supported. (Of course you need to enable those tracers in your
> > kernel first.)
> >
> > v4 changes:
> > * use pid instead of tid for -p option (David)
> > * not to poll() for reading ftrace pipes (Jiri)
> > * rename to create_perf_header() (Jiri)
> > * canonicalize directory name (Jiri)
> > * show more info when -D option was given (Jiri)
> > * update documentation (Jiri)
> > * and few more bug fixes
> >
> > I pushed it out to 'perf/ftrace-v4' branch on my tree at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
> >
> >
> > Any comments are welcome, thanks,
> > Namhyung
>
> hi,
> I've got following error when compiling on Fedora 16:
>
> CC builtin-ftrace.o
> builtin-ftrace.c: In function ???set_tracing_clock???:
> builtin-ftrace.c:197:14: error: declaration of ???clock??? shadows a global declaration [-Werror=shadow]
> builtin-ftrace.c: In function ???canonicalize_directory_name???:
> builtin-ftrace.c:350:53: error: declaration of ???dirname??? shadows a global declaration [-Werror=shadow]
> cc1: all warnings being treated as errors
>
> Fedora 17 seems to be ok.. strange ;-)

Might be due to:

/usr/include/time.h:extern clock_t clock (void) __THROW;

which is a sad namespace land-grab by time.h, 'clock' is such a nice local
variable name.

Thanks,

Ingo

2013-08-05 07:17:54

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 01/17] perf util: Save pid-cmdline mapping into tracing header

Hi Arnaldo,

On Wed, 31 Jul 2013 11:07:46 -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Jul 31, 2013 at 09:51:54AM +0900, Namhyung Kim escreveu:
>> 2013-07-30 오후 10:28, Arnaldo Carvalho de Melo 쓴 글:
>> >Em Tue, Jul 30, 2013 at 06:18:58PM +0900, Namhyung Kim escreveu:
>> >>Current trace info data lacks the saved cmdline mapping which is
>> >>needed for pevent to find out the comm of a task. Add this and bump
>> >>up the version number so that perf can determine its presence when
>> >>reading.
>
>> >Can't we get this from the PERF_RECORD_COMM synthesized at the beginning
>> >of the session + the kernel generated ones for new threads?
>
>> >We would just call machine__find_thread() and use thread->comm, etc, no?
>
>> Yes, I think it's doable. However it needs an additional perf event
>> only to generate the COMM events. So I decided not to do it and
>
> I don't think this is a problem, its in line with Ingo's suggestion of a
> new perf ioctl to ask the kernel to generate PERF_RECORD_MMAP events for
> existing threads.

Hmm.. could you please give me a link of the thread?

And what kind of event should be used then?

Thanks,
Namhyung

2013-08-05 07:44:48

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCHSET 00/17] perf tools: Introduce new 'ftrace' command (v4)

Hi Ingo and Jiri,

On Fri, 2 Aug 2013 11:16:59 +0200, Ingo Molnar wrote:
> * Jiri Olsa <[email protected]> wrote:
>
>> On Tue, Jul 30, 2013 at 06:18:57PM +0900, Namhyung Kim wrote:
>> > This patchset implements a front-end tool for kernel's ftrace. It
>> > uses function_graph tracer by default and normal function tracer is
>> > also supported. (Of course you need to enable those tracers in your
>> > kernel first.)
>> >
>> > v4 changes:
>> > * use pid instead of tid for -p option (David)
>> > * not to poll() for reading ftrace pipes (Jiri)
>> > * rename to create_perf_header() (Jiri)
>> > * canonicalize directory name (Jiri)
>> > * show more info when -D option was given (Jiri)
>> > * update documentation (Jiri)
>> > * and few more bug fixes
>> >
>> > I pushed it out to 'perf/ftrace-v4' branch on my tree at:
>> >
>> > git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
>> >
>> >
>> > Any comments are welcome, thanks,
>> > Namhyung
>>
>> hi,
>> I've got following error when compiling on Fedora 16:
>>
>> CC builtin-ftrace.o
>> builtin-ftrace.c: In function ???set_tracing_clock???:
>> builtin-ftrace.c:197:14: error: declaration of ???clock??? shadows a global declaration [-Werror=shadow]
>> builtin-ftrace.c: In function ???canonicalize_directory_name???:
>> builtin-ftrace.c:350:53: error: declaration of ???dirname??? shadows a global declaration [-Werror=shadow]
>> cc1: all warnings being treated as errors
>>
>> Fedora 17 seems to be ok.. strange ;-)
>
> Might be due to:
>
> /usr/include/time.h:extern clock_t clock (void) __THROW;
>
> which is a sad namespace land-grab by time.h, 'clock' is such a nice local
> variable name.

Thanks for the info. I'll change the names.

Thanks,
Namhyung

2013-08-07 13:44:27

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 01/17] perf util: Save pid-cmdline mapping into tracing header

On 8/5/13 3:17 AM, Namhyung Kim wrote:
>> I don't think this is a problem, its in line with Ingo's suggestion of a
>> new perf ioctl to ask the kernel to generate PERF_RECORD_MMAP events for
>> existing threads.
>
> Hmm.. could you please give me a link of the thread?

I believe this is the thread being referred to:
https://lkml.org/lkml/2013/6/25/180

David

2013-08-09 05:51:05

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 01/17] perf util: Save pid-cmdline mapping into tracing header

Hi David,

On Wed, 07 Aug 2013 09:44:24 -0400, David Ahern wrote:
> On 8/5/13 3:17 AM, Namhyung Kim wrote:
>>> I don't think this is a problem, its in line with Ingo's suggestion of a
>>> new perf ioctl to ask the kernel to generate PERF_RECORD_MMAP events for
>>> existing threads.
>>
>> Hmm.. could you please give me a link of the thread?
>
> I believe this is the thread being referred to:
> https://lkml.org/lkml/2013/6/25/180

Thank you!
Namhyung

2013-08-09 10:23:32

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCHSET 00/17] perf tools: Introduce new 'ftrace' command (v4)

On Tue, Jul 30, 2013 at 06:18:57PM +0900, Namhyung Kim wrote:
> This patchset implements a front-end tool for kernel's ftrace. It
> uses function_graph tracer by default and normal function tracer is
> also supported. (Of course you need to enable those tracers in your
> kernel first.)
>
> v4 changes:
> * use pid instead of tid for -p option (David)
> * not to poll() for reading ftrace pipes (Jiri)
> * rename to create_perf_header() (Jiri)
> * canonicalize directory name (Jiri)
> * show more info when -D option was given (Jiri)
> * update documentation (Jiri)
> * and few more bug fixes
>
> I pushed it out to 'perf/ftrace-v4' branch on my tree at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

got following for record:

[root@dhcp-26-214 perf]# ./perf ftrace record kill
usage: kill [ -s signal | -p ] [ -a ] pid ...
kill -l [ signal ]
recorder failed for some reason on cpu0
recorder failed for some reason on cpu1


afterwards 'perf ftrace report' gets stuck with 2 processes:

[jolsa@dhcp-26-214 perf]$ sudo pstack 1470
#0 0x00007f23ab61e997 in munmap () from /lib64/libc.so.6
#1 0x000000000047b5f7 in __perf_session__process_events ()
#2 0x000000000043f230 in do_ftrace_report ()
#3 0x0000000000440555 in cmd_ftrace ()
#4 0x000000000041aad3 in run_builtin ()
#5 0x000000000041a379 in main ()
[jolsa@dhcp-26-214 perf]$ sudo pstack 1471
#0 0x00007f23ab61b623 in __select_nocancel () from /lib64/libc.so.6
#1 0x000000000046a337 in pager_preexec ()
#2 0x000000000045f5a3 in start_command ()
#3 0x000000000046a4ab in setup_pager ()
#4 0x0000000000440535 in cmd_ftrace ()
#5 0x000000000041aad3 in run_builtin ()
#6 0x000000000041a379 in main ()


jirka

2013-08-09 10:25:31

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCHSET 00/17] perf tools: Introduce new 'ftrace' command (v4)

On Tue, Jul 30, 2013 at 06:18:57PM +0900, Namhyung Kim wrote:
> This patchset implements a front-end tool for kernel's ftrace. It
> uses function_graph tracer by default and normal function tracer is
> also supported. (Of course you need to enable those tracers in your
> kernel first.)
>
> v4 changes:
> * use pid instead of tid for -p option (David)
> * not to poll() for reading ftrace pipes (Jiri)
> * rename to create_perf_header() (Jiri)
> * canonicalize directory name (Jiri)
> * show more info when -D option was given (Jiri)
> * update documentation (Jiri)
> * and few more bug fixes
>
> I pushed it out to 'perf/ftrace-v4' branch on my tree at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

also, this one works now:

./perf --no-pager ftrace live -t function -p `pgrep yes`

but if I kill 'yes' perf live commands hangs,
I'd expect it'd stop

jirka

2013-08-09 10:31:00

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCHSET 00/17] perf tools: Introduce new 'ftrace' command (v4)

On Fri, Aug 09, 2013 at 12:25:15PM +0200, Jiri Olsa wrote:
> On Tue, Jul 30, 2013 at 06:18:57PM +0900, Namhyung Kim wrote:
> > This patchset implements a front-end tool for kernel's ftrace. It
> > uses function_graph tracer by default and normal function tracer is
> > also supported. (Of course you need to enable those tracers in your
> > kernel first.)
> >
> > v4 changes:
> > * use pid instead of tid for -p option (David)
> > * not to poll() for reading ftrace pipes (Jiri)
> > * rename to create_perf_header() (Jiri)
> > * canonicalize directory name (Jiri)
> > * show more info when -D option was given (Jiri)
> > * update documentation (Jiri)
> > * and few more bug fixes
> >
> > I pushed it out to 'perf/ftrace-v4' branch on my tree at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
>
> also, this one works now:
>
> ./perf --no-pager ftrace live -t function -p `pgrep yes`
>
> but if I kill 'yes' perf live commands hangs,
> I'd expect it'd stop

on the other hand it looks like it's common behaviour
for standard record/stat... ;-)

jirka

2013-08-13 02:20:57

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCHSET 00/17] perf tools: Introduce new 'ftrace' command (v4)

Hi Jiri and Jeremy,

On Fri, 9 Aug 2013 12:23:06 +0200, Jiri Olsa wrote:
> On Tue, Jul 30, 2013 at 06:18:57PM +0900, Namhyung Kim wrote:
>> This patchset implements a front-end tool for kernel's ftrace. It
>> uses function_graph tracer by default and normal function tracer is
>> also supported. (Of course you need to enable those tracers in your
>> kernel first.)
>>
>> v4 changes:
>> * use pid instead of tid for -p option (David)
>> * not to poll() for reading ftrace pipes (Jiri)
>> * rename to create_perf_header() (Jiri)
>> * canonicalize directory name (Jiri)
>> * show more info when -D option was given (Jiri)
>> * update documentation (Jiri)
>> * and few more bug fixes
>>
>> I pushed it out to 'perf/ftrace-v4' branch on my tree at:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
>
> got following for record:
>
> [root@dhcp-26-214 perf]# ./perf ftrace record kill
> usage: kill [ -s signal | -p ] [ -a ] pid ...
> kill -l [ signal ]
> recorder failed for some reason on cpu0
> recorder failed for some reason on cpu1

It seems like an empty trace pipe returns EAGAIN after child exited - I
expected an EOF. I don't know why I didn't meet this on my system (FYI
I'm running F17 and this patch tested on that kernel mostly). Maybe
recent kernel changed the behavior.

Anyway I added a check and printed error message on other cases. Could
you please test whether the patch below works?

>
>
> afterwards 'perf ftrace report' gets stuck with 2 processes:
>
> [jolsa@dhcp-26-214 perf]$ sudo pstack 1470
> #0 0x00007f23ab61e997 in munmap () from /lib64/libc.so.6
> #1 0x000000000047b5f7 in __perf_session__process_events ()
> #2 0x000000000043f230 in do_ftrace_report ()
> #3 0x0000000000440555 in cmd_ftrace ()
> #4 0x000000000041aad3 in run_builtin ()
> #5 0x000000000041a379 in main ()

Hmm.. this repeated munmap() calls are fixed also.


> [jolsa@dhcp-26-214 perf]$ sudo pstack 1471
> #0 0x00007f23ab61b623 in __select_nocancel () from /lib64/libc.so.6
> #1 0x000000000046a337 in pager_preexec ()
> #2 0x000000000045f5a3 in start_command ()
> #3 0x000000000046a4ab in setup_pager ()
> #4 0x0000000000440535 in cmd_ftrace ()
> #5 0x000000000041aad3 in run_builtin ()
> #6 0x000000000041a379 in main ()



------------8<-------------
diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 9e78ec19caeb..10590b794cae 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -555,17 +555,25 @@ sleep:
while (true) {
int n = read(trace_fd, buf, sizeof(buf));

- if (n < 0)
- goto out_close;
- if (n == 0)
+ if (n < 0) {
+ if (errno == EINTR || errno == EAGAIN)
+ break;
+ perror("flush read");
+ goto out_close2;
+ } else if (n == 0)
break;
- if (write(output_fd, buf, n) != n)
- goto out_close;
+
+ if (write(output_fd, buf, n) != n) {
+ perror("flush write");
+ goto out_close2;
+ }

byte_written += n;
}
fra->state = RECORD_STATE__DONE;

+out_close2:
+ close(output_fd);
out_close:
close(trace_fd);
out:
@@ -579,6 +587,8 @@ out:
pthread_cond_signal(&recorder_ready_cond);
pthread_mutex_unlock(&recorder_mutex);
}
+
+ pr_debug2("done with %ld bytes\n", (long)byte_written);
return fra;
}

@@ -1139,12 +1149,12 @@ retry:
return record;
}

- munmap(fra->map, pevent_get_page_size(ftrace->pevent));
- fra->map = NULL;
-
if (fra->done)
return NULL;

+ munmap(fra->map, pevent_get_page_size(ftrace->pevent));
+ fra->map = NULL;
+
fra->offset += pevent_get_page_size(ftrace->pevent);
if (fra->offset >= fra->size) {
/* EOF */

2013-08-28 15:42:23

by Jeremy Eder

[permalink] [raw]
Subject: Re: [PATCHSET 00/17] perf tools: Introduce new 'ftrace' command (v4)

On 130813 11:20:52, Namhyung Kim wrote:
> ------------8<-------------
> diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
> index 9e78ec19caeb..10590b794cae 100644
> --- a/tools/perf/builtin-ftrace.c
> +++ b/tools/perf/builtin-ftrace.c
> @@ -555,17 +555,25 @@ sleep:
> while (true) {
> int n = read(trace_fd, buf, sizeof(buf));
>
> - if (n < 0)
> - goto out_close;
> - if (n == 0)
> + if (n < 0) {
> + if (errno == EINTR || errno == EAGAIN)
> + break;
> + perror("flush read");
> + goto out_close2;
> + } else if (n == 0)
> break;
> - if (write(output_fd, buf, n) != n)
> - goto out_close;
> +
> + if (write(output_fd, buf, n) != n) {
> + perror("flush write");
> + goto out_close2;
> + }
>
> byte_written += n;
> }
> fra->state = RECORD_STATE__DONE;
>
> +out_close2:
> + close(output_fd);
> out_close:
> close(trace_fd);
> out:
> @@ -579,6 +587,8 @@ out:
> pthread_cond_signal(&recorder_ready_cond);
> pthread_mutex_unlock(&recorder_mutex);
> }
> +
> + pr_debug2("done with %ld bytes\n", (long)byte_written);
> return fra;
> }
>

Hmm, I already had hunk #3 in your git tree v4.

> @@ -1139,12 +1149,12 @@ retry:
> return record;
> }
>
> - munmap(fra->map, pevent_get_page_size(ftrace->pevent));
> - fra->map = NULL;
> -
> if (fra->done)
> return NULL;
>
> + munmap(fra->map, pevent_get_page_size(ftrace->pevent));
> + fra->map = NULL;
> +
> fra->offset += pevent_get_page_size(ftrace->pevent);
> if (fra->offset >= fra->size) {
> /* EOF */


After patching your tree with just the first 2 hunks, I'm able to
get ftrace-style function graphing out of perf.

# ./perf ftrace record df
Filesystem 1K-blocks Used Available Use% Mounted on
<snip>...

# ./perf --no-pager ftrace show | head -20
overriding event (11) ftrace:funcgraph_entry with new print handler
overriding event (10) ftrace:funcgraph_exit with new print handler
2) 0.686 us | finish_task_switch();
2) 0.260 us | finish_wait();
2) | mutex_lock() {
2) 0.211 us | _cond_resched();
2) 1.170 us | }
2) 0.319 us | generic_pipe_buf_confirm();
2) 0.261 us | generic_pipe_buf_map();
2) 0.129 us | generic_pipe_buf_unmap();
2) 0.747 us | anon_pipe_buf_release();
2) 0.138 us | mutex_unlock();
2) | __wake_up_sync_key() {
2) 0.279 us | _raw_spin_lock_irqsave();
2) 0.135 us | __wake_up_common();
2) 0.133 us | __lock_text_start();
2) 3.386 us | }
2) | kill_fasync() {
2) | smp_reschedule_interrupt() {
2) 0.130 us | kvm_guest_apic_eoi_write();

Nice.

Not sure if you intend to move all ftrace functionality over to
perf ftrace, but the function graph timings is a great start and something sorely
missing.

Do you intend to add -e event support or -l function-specific options ? In the real
world, without filtering on events or functions, I've had systems hang, plus
performance impact is too great.

A common invocation of ftrace via trace-cmd is:
# trace-cmd record -p function_graph -e irq:* -l do_IRQ ping -c1 http://www.redhat.com

So possible perf equivalent?
# ./perf ftrace record -e irq:* -e do_IRQ ping -c1 http://www.redhat.com

Thanks!

2013-08-29 02:56:49

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCHSET 00/17] perf tools: Introduce new 'ftrace' command (v4)

Hi Jeremy,

On Wed, 28 Aug 2013 10:57:08 -0400, Jeremy Eder wrote:
> On 130813 11:20:52, Namhyung Kim wrote:
>> @@ -579,6 +587,8 @@ out:
>> pthread_cond_signal(&recorder_ready_cond);
>> pthread_mutex_unlock(&recorder_mutex);
>> }
>> +
>> + pr_debug2("done with %ld bytes\n", (long)byte_written);
>> return fra;
>> }
>>
>
> Hmm, I already had hunk #3 in your git tree v4.

Oops, sorry about that.

>
>> @@ -1139,12 +1149,12 @@ retry:
>> return record;
>> }
>>
>> - munmap(fra->map, pevent_get_page_size(ftrace->pevent));
>> - fra->map = NULL;
>> -
>> if (fra->done)
>> return NULL;
>>
>> + munmap(fra->map, pevent_get_page_size(ftrace->pevent));
>> + fra->map = NULL;
>> +
>> fra->offset += pevent_get_page_size(ftrace->pevent);
>> if (fra->offset >= fra->size) {
>> /* EOF */
>
>
> After patching your tree with just the first 2 hunks, I'm able to
> get ftrace-style function graphing out of perf.
>
> # ./perf ftrace record df
> Filesystem 1K-blocks Used Available Use% Mounted on
> <snip>...
>
> # ./perf --no-pager ftrace show | head -20
> overriding event (11) ftrace:funcgraph_entry with new print handler
> overriding event (10) ftrace:funcgraph_exit with new print handler
> 2) 0.686 us | finish_task_switch();
> 2) 0.260 us | finish_wait();
> 2) | mutex_lock() {
> 2) 0.211 us | _cond_resched();
> 2) 1.170 us | }
> 2) 0.319 us | generic_pipe_buf_confirm();
> 2) 0.261 us | generic_pipe_buf_map();
> 2) 0.129 us | generic_pipe_buf_unmap();
> 2) 0.747 us | anon_pipe_buf_release();
> 2) 0.138 us | mutex_unlock();
> 2) | __wake_up_sync_key() {
> 2) 0.279 us | _raw_spin_lock_irqsave();
> 2) 0.135 us | __wake_up_common();
> 2) 0.133 us | __lock_text_start();
> 2) 3.386 us | }
> 2) | kill_fasync() {
> 2) | smp_reschedule_interrupt() {
> 2) 0.130 us | kvm_guest_apic_eoi_write();
>
> Nice.

Glad to see this. :)

>
> Not sure if you intend to move all ftrace functionality over to
> perf ftrace, but the function graph timings is a great start and something sorely
> missing.
>
> Do you intend to add -e event support or -l function-specific options ? In the real
> world, without filtering on events or functions, I've had systems hang, plus
> performance impact is too great.
>
> A common invocation of ftrace via trace-cmd is:
> # trace-cmd record -p function_graph -e irq:* -l do_IRQ ping -c1 http://www.redhat.com
>
> So possible perf equivalent?
> # ./perf ftrace record -e irq:* -e do_IRQ ping -c1 http://www.redhat.com

Yes I'm considering adding more features in trace-cmd to perf ftrace.
The function filtering is the one in the highest priority. I also want
to add support for other events but it needs bit more thinking.

Thanks,
Namhyung