2013-04-23 08:35:39

by Namhyung Kim

[permalink] [raw]
Subject: [RFC 00/14] perf tools: Introduce new 'ftrace' command

Hello,

This patchset implements a front-end tool for kernel's ftrace. It
uses function_graph tracer by default and normal function tracer is
also supported. (Of course you need to enable those tracers in your
kernel first.)

NOTE: It's in very early stage, so may contain many rough edges.

It consists of 4 subcommands: live, record, show and report.

'perf ftrace live' just triggers ftrace and relay kernel buffer
contents to stdout. It does no processing in the tool side.

'perf ftrace record' starts ftrace and saves its result to per-cpu
files and a perf.header file in the perf.data.dir directory.
Recording was done by multiple threads (a thread per cpu) in order not
to miss events overrun. The perf.header file is compatible to current
perf.data file and contains useful information and sample data.

The sample data were synthesized for each recorded cpu to provide more
information - I'm not sure it's really needed though. :)

Once you had run 'perf ftrace record', you could play with other
subcommands.

'perf ftrace show' displays function traces like 'live' subcommand or
trace-cmd does. It's not useful than them at this time but it could
be improved soon.

'perf ftrace report' displays usual 'perf report' style output from
the function trace data. You can see which function is called most
frequently for example. Currently it uses 1 as a period value for
each entry but we might use funcgraph_exit->calltime to get proper
overhead later.

Example below:

# perf ftrace record sleep 0.1
# ls -l perf.data.dir
total 5568
-rw-r--r--. 1 root root 3514375 Apr 23 16:43 perf.header
-rw-r--r--. 1 root root 90112 Apr 23 16:43 trace-cpu0.buf
-rw-r--r--. 1 root root 0 Apr 23 16:43 trace-cpu1.buf
-rw-r--r--. 1 root root 2093056 Apr 23 16:43 trace-cpu2.buf
-rw-r--r--. 1 root root 0 Apr 23 16:43 trace-cpu3.buf

# perf ftrace show
overriding event (11) ftrace:funcgraph_entry with new print handler
overriding event (10) ftrace:funcgraph_exit with new print handler
sleep-11573 [000] 717758.728678862: funcgraph_entry: finish_task_switch() {
sleep-11573 [000] 717758.728682155: funcgraph_exit: }
sleep-11573 [000] 717758.728683099: funcgraph_entry: finish_wait() {
sleep-11573 [000] 717758.728683617: funcgraph_exit: }
sleep-11573 [000] 717758.728683993: funcgraph_entry: pipe_lock() {
sleep-11573 [000] 717758.728684425: funcgraph_entry: pipe_lock_nested.isra.
sleep-11573 [000] 717758.728684863: funcgraph_entry: mutex_lock() {
sleep-11573 [000] 717758.728685239: funcgraph_entry: _cond_resched() {
sleep-11573 [000] 717758.728685807: funcgraph_exit: }
sleep-11573 [000] 717758.728686296: funcgraph_exit: }
sleep-11573 [000] 717758.728686757: funcgraph_exit: }
sleep-11573 [000] 717758.728687529: funcgraph_exit: }
sleep-11573 [000] 717758.728688608: funcgraph_entry: mutex_unlock() {
sleep-11573 [000] 717758.728689343: funcgraph_exit: }
sleep-11573 [000] 717758.728690685: funcgraph_entry: __audit_syscall_exit() {
sleep-11573 [000] 717758.728691499: funcgraph_entry: path_put() {
sleep-11573 [000] 717758.728692147: funcgraph_entry: dput() {
sleep-11573 [000] 717758.728692845: funcgraph_exit: }
sleep-11573 [000] 717758.728693380: funcgraph_entry: mntput() {
sleep-11573 [000] 717758.728694021: funcgraph_exit: }
sleep-11573 [000] 717758.728694520: funcgraph_exit: }
sleep-11573 [000] 717758.728695005: funcgraph_entry: unroll_tree_refs() {
sleep-11573 [000] 717758.728695659: funcgraph_exit: }
sleep-11573 [000] 717758.728696301: funcgraph_entry: kfree() {
sleep-11573 [000] 717758.728697085: funcgraph_exit: }
sleep-11573 [000] 717758.728697916: funcgraph_exit: }


# perf ftrace report
# ========
# captured on: Tue Apr 23 16:43:52 2013
# hostname : sejong.aot.lge.com
# os release : 3.6.11-5.fc17.x86_64
# perf version : 3.9.rc8.gceb5f8
# arch : x86_64
# nrcpus online : 12
# nrcpus avail : 12
# cpudesc : Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz
# cpuid : GenuineIntel,6,45,7
# total memory : 32901952 kB
# cmdline : /home/namhyung/project/linux/tools/perf/perf ftrace record sleep 0.1
# event : name = ftrace:funcgraph_entry, type = 2, config = 0xb, config1 = 0x0, config2 = 0x0
# event : name = ftrace:funcgraph_exit, type = 2, config = 0xa, config1 = 0x0, config2 = 0x0,
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# ========
#
# Samples: 22K of event 'ftrace:funcgraph_entry'
# Event count (approx.): 22433
#
# Overhead Command Shared Object Symbol
# ........ ....... ................. ..............................
#
7.90% sleep [kernel.kallsyms] [k] _cond_resched
4.57% sleep [kernel.kallsyms] [k] ebitmap_get_bit
3.50% sleep [kernel.kallsyms] [k] __phys_addr
3.12% sleep [kernel.kallsyms] [k] _raw_spin_lock
3.09% sleep [kernel.kallsyms] [k] kmem_cache_free
2.93% sleep [kernel.kallsyms] [k] __slab_free
2.86% sleep [kernel.kallsyms] [k] __tlb_remove_page
2.35% sleep [kernel.kallsyms] [k] vm_normal_page
2.33% sleep [kernel.kallsyms] [k] page_remove_rmap
1.87% sleep [kernel.kallsyms] [k] mutex_unlock



You can get it from 'perf/ftrace-v1' branch on my tree at:

git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git


Also note that this patchset is based on my previous work [1][2].

Any comments are welcome, thanks,
Namhyung


[1] https://lkml.org/lkml/2013/4/11/194
[2] https://lkml.org/lkml/2013/4/15/13


Namhyung Kim (14):
perf util: Move debugfs/tracing helper functions to util.c
perf util: Use evsel->name to get tracepoint_paths
perf util: Save pid-cmdline mapping into tracing header
perf util: Add more debug message on failure path
perf tools: Introduce new 'ftrace' tool
perf ftrace: Add support for --pid option
perf ftrace: Add support for -a and -C option
perf ftrace: Split "live" sub-command
perf ftrace: Add 'record' sub-command
perf ftrace: Add 'show' sub-command
perf ftrace: Add 'report' sub-command
perf ftrace: Use pager for displaying result
perf ftrace: Cleanup using ftrace_setup/teardown()
perf tools: Add document for perf-ftrace command

tools/perf/Documentation/perf-ftrace.txt | 114 ++
tools/perf/Makefile | 1 +
tools/perf/builtin-ftrace.c | 1665 ++++++++++++++++++++++++++++++
tools/perf/builtin.h | 1 +
tools/perf/command-list.txt | 1 +
tools/perf/perf.c | 1 +
tools/perf/util/header.c | 4 +-
tools/perf/util/trace-event-info.c | 111 +-
tools/perf/util/trace-event-parse.c | 17 +
tools/perf/util/trace-event-read.c | 77 +-
tools/perf/util/trace-event.h | 1 +
tools/perf/util/util.c | 59 ++
tools/perf/util/util.h | 3 +
13 files changed, 1981 insertions(+), 74 deletions(-)
create mode 100644 tools/perf/Documentation/perf-ftrace.txt
create mode 100644 tools/perf/builtin-ftrace.c

--
1.7.11.7


2013-04-23 08:31:19

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 01/14] perf util: Move debugfs/tracing helper functions to util.c

From: Namhyung Kim <[email protected]>

Since they're generic helpers move them to util.c so that they can be
used by others.

Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/trace-event-info.c | 59 --------------------------------------
tools/perf/util/util.c | 59 ++++++++++++++++++++++++++++++++++++++
tools/perf/util/util.h | 3 ++
3 files changed, 62 insertions(+), 59 deletions(-)

diff --git a/tools/perf/util/trace-event-info.c b/tools/perf/util/trace-event-info.c
index e85cbcf298f3..ab18bf12d54a 100644
--- a/tools/perf/util/trace-event-info.c
+++ b/tools/perf/util/trace-event-info.c
@@ -46,65 +46,6 @@
static int output_fd;


-static const char *find_debugfs(void)
-{
- const char *path = perf_debugfs_mount(NULL);
-
- if (!path)
- pr_debug("Your kernel does not support the debugfs filesystem");
-
- return path;
-}
-
-/*
- * Finds the path to the debugfs/tracing
- * Allocates the string and stores it.
- */
-static const char *find_tracing_dir(void)
-{
- static char *tracing;
- static int tracing_found;
- const char *debugfs;
-
- if (tracing_found)
- return tracing;
-
- debugfs = find_debugfs();
- if (!debugfs)
- return NULL;
-
- tracing = malloc(strlen(debugfs) + 9);
- if (!tracing)
- return NULL;
-
- sprintf(tracing, "%s/tracing", debugfs);
-
- tracing_found = 1;
- return tracing;
-}
-
-static char *get_tracing_file(const char *name)
-{
- const char *tracing;
- char *file;
-
- tracing = find_tracing_dir();
- if (!tracing)
- return NULL;
-
- file = malloc(strlen(tracing) + strlen(name) + 2);
- if (!file)
- return NULL;
-
- sprintf(file, "%s/%s", tracing, name);
- return file;
-}
-
-static void put_tracing_file(char *file)
-{
- free(file);
-}
-
int bigendian(void)
{
unsigned char str[] = { 0x1, 0x2, 0x3, 0x4, 0x0, 0x0, 0x0, 0x0};
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 59d868add275..9a0658405760 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -269,3 +269,62 @@ void perf_debugfs_set_path(const char *mntpt)
snprintf(debugfs_mountpoint, strlen(debugfs_mountpoint), "%s", mntpt);
set_tracing_events_path(mntpt);
}
+
+static const char *find_debugfs(void)
+{
+ const char *path = perf_debugfs_mount(NULL);
+
+ if (!path)
+ fprintf(stderr, "Your kernel does not support the debugfs filesystem");
+
+ return path;
+}
+
+/*
+ * Finds the path to the debugfs/tracing
+ * Allocates the string and stores it.
+ */
+const char *find_tracing_dir(void)
+{
+ static char *tracing;
+ static int tracing_found;
+ const char *debugfs;
+
+ if (tracing_found)
+ return tracing;
+
+ debugfs = find_debugfs();
+ if (!debugfs)
+ return NULL;
+
+ tracing = malloc(strlen(debugfs) + 9);
+ if (!tracing)
+ return NULL;
+
+ sprintf(tracing, "%s/tracing", debugfs);
+
+ tracing_found = 1;
+ return tracing;
+}
+
+char *get_tracing_file(const char *name)
+{
+ const char *tracing;
+ char *file;
+
+ tracing = find_tracing_dir();
+ if (!tracing)
+ return NULL;
+
+ file = malloc(strlen(tracing) + strlen(name) + 2);
+ if (!file)
+ return NULL;
+
+ sprintf(file, "%s/%s", tracing, name);
+ return file;
+}
+
+void put_tracing_file(char *file)
+{
+ free(file);
+}
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index 7a484c97e500..8f3e44ca1074 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -79,6 +79,9 @@ extern char buildid_dir[];
extern char tracing_events_path[];
extern void perf_debugfs_set_path(const char *mountpoint);
const char *perf_debugfs_mount(const char *mountpoint);
+const char *find_tracing_dir(void);
+char *get_tracing_file(const char *name);
+void put_tracing_file(char *file);

/* On most systems <limits.h> would have given us this, but
* not on some systems (e.g. GNU/Hurd).
--
1.7.11.7

2013-04-23 08:31:38

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 14/14] perf tools: Add document for perf-ftrace command

From: Namhyung Kim <[email protected]>

Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/Documentation/perf-ftrace.txt | 114 +++++++++++++++++++++++++++++++
1 file changed, 114 insertions(+)
create mode 100644 tools/perf/Documentation/perf-ftrace.txt

diff --git a/tools/perf/Documentation/perf-ftrace.txt b/tools/perf/Documentation/perf-ftrace.txt
new file mode 100644
index 000000000000..f530cfabd8a5
--- /dev/null
+++ b/tools/perf/Documentation/perf-ftrace.txt
@@ -0,0 +1,114 @@
+perf-ftrace(1)
+==============
+
+NAME
+----
+perf-ftrace - A front end for kernel's ftrace
+
+SYNOPSIS
+--------
+[verse]
+'perf ftrace' {live|record} [<options>] [<command>]
+'perf ftrace' {show|report} [<options>]
+
+DESCRIPTION
+-----------
+This command reads the ftrace buffer and displays the trace recorded.
+
+There are several variants of perf ftrace:
+
+ 'perf ftrace live <command>' to see a live trace of kernel functions
+ via trace_pipe during executing the <command>. If <command> is not
+ specified, one of target options (-p, -a or -C) should be given. It
+ just print out the result to stdout and doesn't save any files.
+
+ 'perf ftrace record <command>' to record trace entries of kernel
+ functions during the execution of the <command>. Like 'perf ftrace
+ live', at least one of <command> and/or target options should be
+ given in order to start recording. The recorded results are saved
+ under a directory ('perf.data.dir' by default) and will be used by
+ other perf-ftrace tools like 'show' and 'report'.
+
+ 'perf ftrace show' to see the trace of recorded functions. It shows
+ functions sorted by time so the end result might be interspersed if
+ there's a concurrent execution.
+
+ 'perf ftrace report' to display the result in an usual perf-report
+ style - entries are sorted by given sort keys and output is resorted
+ by its overhead.
+
+OPTIONS
+-------
+<command>...::
+ Any command you can specify in a shell.
+
+-t::
+--tracer=::
+ The ftrace tracer to be used (default: function_graph).
+ Currently only 'function' and 'function_graph' are supported.
+ Used by 'live' and 'record' subcommands only.
+
+-p::
+--pid=::
+ Record events on existing process ID (comma separated list).
+ Used by 'live' and 'record' subcommands only.
+
+-a::
+ Force system-wide collection. Scripts run without a <command>
+ normally use -a by default, while scripts run with a <command>
+ normally don't - this option allows the latter to be run in
+ system-wide mode. Used by 'live' and 'record' subcommands only.
+
+-C::
+--cpu:: Only process samples for the list of CPUs provided.
+ Multiple CPUs can be provided as a comma-separated list with
+ no space: 0,1. Ranges of CPUs are specified with -: 0-2.
+ Default is to report samples on all online CPUs.
+
+-s::
+--sort=::
+ Sort histogram entries by given key(s) - multiple keys can be
+ specified in CSV format. Following sort keys are available:
+ pid, comm, dso, symbol, cpu.
+
+ Each key has following meaning:
+
+ - comm: command (name) of the task
+ - pid: command and tid of the task
+ - dso: name of library or module executed at the time of sample
+ - symbol: name of function executed at the time of sample
+ - cpu: cpu number the task ran at the time of sample
+
+ By default, comm, dso and symbol keys are used.
+ (i.e. --sort comm,dso,symbol)
+ Used by 'report' subcommands only.
+
+-i::
+--input=::
+ Input directory name excluding '.dir' at the end.
+ (default: perf.data)
+
+-o::
+--output=::
+ Output directory name excluding '.dir' at the end.
+ (default: perf.data)
+
+-v::
+--verbose::
+ Be more verbose (show counter open errors, etc).
+
+-D::
+--dump-raw-trace::
+ Dump raw trace in ASCII. Used by 'report' subcommands only.
+
+-I::
+--show-info::
+ Display extended information about the record. This adds
+ information which may be very large and thus may clutter the
+ display. It currently includes: cpu and numa topology of the
+ host system. It can only be used with the 'report' subcommand.
+
+SEE ALSO
+--------
+linkperf:perf-record[1], linkperf:perf-report[1],
+linkperf:perf-script[1]
--
1.7.11.7

2013-04-23 08:31:40

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 13/14] perf ftrace: Cleanup using ftrace_setup/teardown()

From: Namhyung Kim <[email protected]>

The ftrace sub-commands share some common code so that factor it out
to ftrace_setup() and ftrace_teardown() helpers.

Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-ftrace.c | 168 +++++++++++++++++---------------------------
1 file changed, 64 insertions(+), 104 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index f4efa1475cf9..95f9a7fa3e04 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -1409,10 +1409,59 @@ out:
return ret;
}

+static int ftrace_setup(struct perf_ftrace *ftrace, int argc, const char **argv)
+{
+ int ret;
+ char errbuf[512];
+
+ ret = perf_target__validate(&ftrace->target);
+ if (ret) {
+ perf_target__strerror(&ftrace->target, ret, errbuf, 512);
+ pr_err("%s\n", errbuf);
+ return -EINVAL;
+ }
+
+ ftrace->evlist = perf_evlist__new();
+ if (ftrace->evlist == NULL)
+ return -ENOMEM;
+
+ ret = perf_evlist__create_maps(ftrace->evlist, &ftrace->target);
+ if (ret < 0)
+ goto out;
+
+ if (ftrace->tracer == NULL)
+ ftrace->tracer = DEFAULT_TRACER;
+
+ if (ftrace->dirname == NULL)
+ ftrace->dirname = DEFAULT_DIRNAME;
+
+ if (argc) {
+ ret = perf_evlist__prepare_workload(ftrace->evlist,
+ &ftrace->target,
+ argv, false, true);
+ if (ret < 0)
+ goto out_maps;
+ }
+ return 0;
+
+out_maps:
+ perf_evlist__delete_maps(ftrace->evlist);
+out:
+ perf_evlist__delete(ftrace->evlist);
+
+ return ret;
+}
+
+static void ftrace_teardown(struct perf_ftrace *ftrace)
+{
+ perf_evlist__delete_maps(ftrace->evlist);
+ perf_evlist__delete(ftrace->evlist);
+}
+
static int
__cmd_ftrace_live(struct perf_ftrace *ftrace, int argc, const char **argv)
{
- int ret = -1;
+ int ret;
const char * const live_usage[] = {
"perf ftrace live [<options>] [<command>]",
"perf ftrace live [<options>] -- <command> [<options>]",
@@ -1437,47 +1486,22 @@ __cmd_ftrace_live(struct perf_ftrace *ftrace, int argc, const char **argv)
if (!argc && perf_target__none(&ftrace->target))
usage_with_options(live_usage, live_options);

- ret = perf_target__validate(&ftrace->target);
- if (ret) {
- char errbuf[512];
-
- perf_target__strerror(&ftrace->target, ret, errbuf, 512);
- pr_err("%s\n", errbuf);
- return -EINVAL;
- }
-
- ftrace->evlist = perf_evlist__new();
- if (ftrace->evlist == NULL)
- return -ENOMEM;
-
- ret = perf_evlist__create_maps(ftrace->evlist, &ftrace->target);
+ ret = ftrace_setup(ftrace, argc, argv);
if (ret < 0)
- goto out;
-
- if (ftrace->tracer == NULL)
- ftrace->tracer = DEFAULT_TRACER;
-
- if (argc && perf_evlist__prepare_workload(ftrace->evlist,
- &ftrace->target,
- argv, false, true) < 0)
- goto out_maps;
+ return ret;

setup_pager();

ret = do_ftrace_live(ftrace);

-out_maps:
- perf_evlist__delete_maps(ftrace->evlist);
-out:
- perf_evlist__delete(ftrace->evlist);
-
+ ftrace_teardown(ftrace);
return ret;
}

static int
__cmd_ftrace_record(struct perf_ftrace *ftrace, int argc, const char **argv)
{
- int ret = -1;
+ int ret;
const char * const record_usage[] = {
"perf ftrace record [<options>] [<command>]",
"perf ftrace record [<options>] -- <command> [<options>]",
@@ -1504,39 +1528,13 @@ __cmd_ftrace_record(struct perf_ftrace *ftrace, int argc, const char **argv)
if (!argc && perf_target__none(&ftrace->target))
usage_with_options(record_usage, record_options);

- ret = perf_target__validate(&ftrace->target);
- if (ret) {
- char errbuf[512];
-
- perf_target__strerror(&ftrace->target, ret, errbuf, 512);
- pr_err("%s\n", errbuf);
- return -EINVAL;
- }
-
- ftrace->evlist = perf_evlist__new();
- if (ftrace->evlist == NULL)
- return -ENOMEM;
-
- ret = perf_evlist__create_maps(ftrace->evlist, &ftrace->target);
+ ret = ftrace_setup(ftrace, argc, argv);
if (ret < 0)
- goto out;
-
- if (ftrace->tracer == NULL)
- ftrace->tracer = DEFAULT_TRACER;
-
- if (ftrace->dirname == NULL)
- ftrace->dirname = DEFAULT_DIRNAME;
-
- if (argc && perf_evlist__prepare_workload(ftrace->evlist,
- &ftrace->target,
- argv, false, true) < 0)
- goto out_maps;
+ return ret;

ret = do_ftrace_record(ftrace);

-out_maps:
perf_evlist__delete_maps(ftrace->evlist);
-out:
perf_evlist__delete(ftrace->evlist);

return ret;
@@ -1545,7 +1543,7 @@ out:
static int
__cmd_ftrace_show(struct perf_ftrace *ftrace, int argc, const char **argv)
{
- int ret = -1;
+ int ret;
const char * const show_usage[] = {
"perf ftrace show [<options>]",
NULL
@@ -1565,41 +1563,22 @@ __cmd_ftrace_show(struct perf_ftrace *ftrace, int argc, const char **argv)
if (argc)
usage_with_options(show_usage, show_options);

- ret = perf_target__validate(&ftrace->target);
- if (ret) {
- char errbuf[512];
-
- perf_target__strerror(&ftrace->target, ret, errbuf, 512);
- pr_err("%s\n", errbuf);
- return -EINVAL;
- }
-
- ftrace->evlist = perf_evlist__new();
- if (ftrace->evlist == NULL)
- return -ENOMEM;
-
- ret = perf_evlist__create_maps(ftrace->evlist, &ftrace->target);
+ ret = ftrace_setup(ftrace, argc, argv);
if (ret < 0)
- goto out;
-
- if (ftrace->dirname == NULL)
- ftrace->dirname = DEFAULT_DIRNAME;
+ return ret;

setup_pager();

ret = do_ftrace_show(ftrace);

- perf_evlist__delete_maps(ftrace->evlist);
-out:
- perf_evlist__delete(ftrace->evlist);
-
+ ftrace_teardown(ftrace);
return ret;
}

static int
__cmd_ftrace_report(struct perf_ftrace *ftrace, int argc, const char **argv)
{
- int ret = -1;
+ int ret;
const char * const report_usage[] = {
"perf ftrace report [<options>]",
NULL
@@ -1625,25 +1604,9 @@ __cmd_ftrace_report(struct perf_ftrace *ftrace, int argc, const char **argv)
if (argc)
usage_with_options(report_usage, report_options);

- ret = perf_target__validate(&ftrace->target);
- if (ret) {
- char errbuf[512];
-
- perf_target__strerror(&ftrace->target, ret, errbuf, 512);
- pr_err("%s\n", errbuf);
- return -EINVAL;
- }
-
- ftrace->evlist = perf_evlist__new();
- if (ftrace->evlist == NULL)
- return -ENOMEM;
-
- ret = perf_evlist__create_maps(ftrace->evlist, &ftrace->target);
+ ret = ftrace_setup(ftrace, argc, argv);
if (ret < 0)
- goto out;
-
- if (ftrace->dirname == NULL)
- ftrace->dirname = DEFAULT_DIRNAME;
+ return ret;

perf_hpp__column_enable(PERF_HPP__OVERHEAD);
perf_hpp__init();
@@ -1657,10 +1620,7 @@ __cmd_ftrace_report(struct perf_ftrace *ftrace, int argc, const char **argv)

ret = do_ftrace_report(ftrace);

- perf_evlist__delete_maps(ftrace->evlist);
-out:
- perf_evlist__delete(ftrace->evlist);
-
+ ftrace_teardown(ftrace);
return ret;
}

--
1.7.11.7

2013-04-23 08:32:34

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 09/14] perf ftrace: Add 'record' sub-command

From: Namhyung Kim <[email protected]>

The ftrace record command is for saving raw ftrace buffer contents
which can be get from per_cpu/cpuX/trace_pipe_raw.

Since ftrace events are generated very frequently so single thread for
recording mostly resulted in buffer overruns. Thus it uses per-cpu
recorder thread to prevent such cases and they save the contents to
their own files.

These per-cpu data files are saved in a directory so that they can be
easily found when needed. I chose the default directory name as
"perf.data.dir" and the first two (i.e. "perf.data") can be changed
with -o option. The structure of the directory looks like:

$ tree perf.data.dir
perf.data.dir/
|-- perf.header
|-- trace-cpu0.buf
|-- trace-cpu1.buf
|-- trace-cpu2.buf
`-- trace-cpu3.buf

In addition to trace-cpuX.buf files, it has perf.header file also.
The perf.header file is compatible with existing perf.data format and
contains usual event information, feature mask and sample data. The
sample data is synthesized to indicate given cpu has a record file.

Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-ftrace.c | 632 ++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 614 insertions(+), 18 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 6d991c44948f..7e72cd65dd0a 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -12,6 +12,7 @@
#include <unistd.h>
#include <signal.h>
#include <fcntl.h>
+#include <dirent.h>

#include "util/debug.h"
#include "util/parse-options.h"
@@ -19,14 +20,17 @@
#include "util/target.h"
#include "util/thread_map.h"
#include "util/cpumap.h"
+#include "util/trace-event.h"


#define DEFAULT_TRACER "function_graph"
+#define DEFAULT_DIRNAME "perf.data"

struct perf_ftrace {
struct perf_evlist *evlist;
struct perf_target target;
const char *tracer;
+ const char *dirname;
};

static bool done;
@@ -131,41 +135,57 @@ static int set_tracing_cpu(struct perf_ftrace *ftrace)
return 0;
}

-static int do_ftrace_live(struct perf_ftrace *ftrace)
+static int setup_tracing_files(struct perf_ftrace *ftrace)
{
- char *trace_file;
- int trace_fd;
- char buf[4096];
- struct pollfd pollfd = {
- .events = POLLIN,
- };
-
- signal(SIGINT, sig_handler);
- signal(SIGUSR1, sig_handler);
- signal(SIGCHLD, sig_handler);
+ int ret = -1;

- if (reset_tracing_files(ftrace) < 0)
+ if (reset_tracing_files(ftrace) < 0) {
+ pr_err("failed to reset tracing files\n");
goto out;
+ }

/* reset ftrace buffer */
- if (write_tracing_file("trace", "0") < 0)
+ if (write_tracing_file("trace", "0") < 0) {
+ pr_err("failed to reset ftrace buffer\n");
goto out;
+ }

if (set_tracing_pid(ftrace) < 0) {
pr_err("failed to set ftrace pid\n");
- goto out_reset;
+ goto out;
}

if (set_tracing_cpu(ftrace) < 0) {
pr_err("failed to set tracing cpumask\n");
- goto out_reset;
+ goto out;
}

if (write_tracing_file("current_tracer", ftrace->tracer) < 0) {
pr_err("failed to set current_tracer to %s\n", ftrace->tracer);
- goto out_reset;
+ goto out;
}

+ ret = 0;
+out:
+ return ret;
+}
+
+static int do_ftrace_live(struct perf_ftrace *ftrace)
+{
+ char *trace_file;
+ int trace_fd;
+ char buf[4096];
+ struct pollfd pollfd = {
+ .events = POLLIN,
+ };
+
+ signal(SIGINT, sig_handler);
+ signal(SIGUSR1, sig_handler);
+ signal(SIGCHLD, sig_handler);
+
+ if (setup_tracing_files(ftrace) < 0)
+ goto out_reset;
+
trace_file = get_tracing_file("trace_pipe");
if (!trace_file) {
pr_err("failed to open trace_pipe\n");
@@ -219,7 +239,513 @@ out_close_fd:
close(trace_fd);
out_reset:
reset_tracing_files(ftrace);
+ return done ? 0 : -1;
+}
+
+static int alloc_ftrace_evsel(struct perf_ftrace *ftrace)
+{
+ if (!strcmp(ftrace->tracer, "function")) {
+ if (perf_evlist__add_newtp(ftrace->evlist, "ftrace",
+ "function", NULL) < 0) {
+ pr_err("failed to allocate ftrace event\n");
+ return -1;
+ }
+ } else if (!strcmp(ftrace->tracer, "function_graph")) {
+ if (perf_evlist__add_newtp(ftrace->evlist, "ftrace",
+ "funcgraph_entry", NULL) ||
+ perf_evlist__add_newtp(ftrace->evlist, "ftrace",
+ "funcgraph_exit", NULL)) {
+ pr_err("failed to allocate ftrace event\n");
+ return -1;
+ }
+ } else {
+ pr_err("Not supported tracer: %s\n", ftrace->tracer);
+ return -1;
+ }
+ return 0;
+}
+
+static int remove_directory(const char *pathname)
+{
+ DIR *dir;
+ int ret = 0;
+ struct dirent *dent;
+ char namebuf[PATH_MAX];
+
+ dir = opendir(pathname);
+ if (dir == NULL)
+ return 0;
+
+ while ((dent = readdir(dir)) != NULL && !ret) {
+ struct stat statbuf;
+
+ if (dent->d_name[0] == '.')
+ continue;
+
+ scnprintf(namebuf, sizeof(namebuf), "%s/%s",
+ pathname, dent->d_name);
+
+ ret = stat(namebuf, &statbuf);
+ if (ret < 0) {
+ pr_debug("stat failed\n");
+ break;
+ }
+
+ if (S_ISREG(statbuf.st_mode))
+ ret = unlink(namebuf);
+ else if (S_ISDIR(statbuf.st_mode))
+ ret = remove_directory(namebuf);
+ else {
+ pr_debug("unknown file.\n");
+ ret = -1;
+ }
+ }
+ closedir(dir);
+
+ if (ret < 0)
+ return ret;
+
+ return rmdir(pathname);
+}
+
+static int create_perf_directory(struct perf_ftrace *ftrace)
+{
+ int err;
+ char buf[PATH_MAX];
+ struct stat statbuf;
+
+ scnprintf(buf, sizeof(buf), "%s.dir", ftrace->dirname);
+
+ if (stat(buf, &statbuf) == 0) {
+ /* same name already exists - rename to *.old.dir */
+ char *old_name = malloc(strlen(buf) + 5);
+ if (old_name == NULL)
+ return -1;
+
+ scnprintf(old_name, strlen(buf) + 5,
+ "%s.old.dir", ftrace->dirname);
+
+ if (remove_directory(old_name) < 0) {
+ perror("rmdir");
+ return -1;
+ }
+
+ if (rename(buf, old_name) < 0) {
+ perror("rename");
+ free(old_name);
+ return -1;
+ }
+
+ free(old_name);
+ }
+
+ err = mkdir(buf, 0755);
+ if (err < 0) {
+ perror("mkdir");
+ return -1;
+ }
+
+ strcat(buf, "/perf.header");
+
+ err = open(buf, O_RDWR | O_CREAT | O_TRUNC, 0644);
+ return err;
+}
+
+static void sig_dummy_handler(int sig __maybe_unused)
+{
+ while (!done)
+ continue;
+}
+
+enum {
+ RECORD_STATE__ERROR = -1,
+ RECORD_STATE__INIT,
+ RECORD_STATE__READY,
+ RECORD_STATE__DONE,
+};
+
+struct ftrace_record_arg {
+ struct perf_ftrace *ftrace;
+ int cpu;
+ int state;
+ pthread_t id;
+ struct list_head node;
+};
+
+static int recorder_count;
+pthread_cond_t recorder_ready_cond = PTHREAD_COND_INITIALIZER;
+pthread_cond_t recorder_start_cond = PTHREAD_COND_INITIALIZER;
+pthread_mutex_t recorder_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+static void *record_ftrace_raw_buffer(void *arg)
+{
+ struct ftrace_record_arg *fra = arg;
+ char buf[4096];
+ char *trace_file;
+ int trace_fd;
+ int output_fd;
+ int ret, n;
+ off_t byte_written = 0;
+ sigset_t sigmask;
+ struct pollfd pollfd = {
+ .events = POLLIN,
+ };
+
+ fra->state = RECORD_STATE__ERROR;
+
+ snprintf(buf, sizeof(buf), "per_cpu/cpu%d/trace_pipe_raw", fra->cpu);
+
+ trace_file = get_tracing_file(buf);
+ if (!trace_file) {
+ pr_err("failed to get trace_pipe_raw\n");
+ goto out;
+ }
+
+ trace_fd = open(trace_file, O_RDONLY);
+
+ put_tracing_file(trace_file);
+
+ if (trace_fd < 0) {
+ pr_err("failed to open trace_pipe_raw\n");
+ goto out;
+ }
+
+ snprintf(buf, sizeof(buf), "%s.dir/trace-cpu%d.buf",
+ fra->ftrace->dirname, fra->cpu);
+
+ output_fd = open(buf, O_WRONLY|O_CREAT|O_TRUNC, 0644);
+ if (output_fd < 0) {
+ pr_err("failed to open output file\n");
+ goto out_close;
+ }
+
+ fra->state = RECORD_STATE__READY;
+
+ /* Now I'm ready */
+ pthread_mutex_lock(&recorder_mutex);
+ recorder_count++;
+ pthread_cond_signal(&recorder_ready_cond);
+ pthread_mutex_unlock(&recorder_mutex);
+
+ /*
+ * block all signals but SIGUSR2.
+ * It'll be used to unblock a recorder to finish.
+ */
+ sigfillset(&sigmask);
+ sigdelset(&sigmask, SIGUSR2);
+ pthread_sigmask(SIG_SETMASK, &sigmask,NULL);
+
+ signal(SIGUSR2, sig_dummy_handler);
+
+ fcntl(trace_fd, F_SETFL, O_NONBLOCK);
+ pollfd.fd = trace_fd;
+
+ pthread_mutex_lock(&recorder_mutex);
+ pthread_cond_wait(&recorder_start_cond, &recorder_mutex);
+ pthread_mutex_unlock(&recorder_mutex);
+
+ pr_debug2("now recording for cpu%d\n", fra->cpu);
+
+ while (!done) {
+ ret = poll(&pollfd, 1, -1);
+ if (ret < 0 && errno != EINTR)
+ goto out_close;
+
+ if (pollfd.revents & POLLIN) {
+ n = read(trace_fd, buf, sizeof(buf));
+ if (n < 0)
+ goto out_close;
+ if (write(output_fd, buf, n) != n)
+ goto out_close;
+ byte_written += n;
+ }
+ }
+
+ /* read remaining buffer contents */
+ while (true) {
+ n = read(trace_fd, buf, sizeof(buf));
+ if (n < 0)
+ goto out_close;
+ if (n == 0)
+ break;
+ if (write(output_fd, buf, n) != n)
+ goto out_close;
+ byte_written += n;
+ }
+ fra->state = RECORD_STATE__DONE;
+
+out_close:
+ close(trace_fd);
out:
+ if (fra->state == RECORD_STATE__ERROR) {
+ /*
+ * We need to update recorder_count in this case also
+ * in order to prevent deadlocking in the main thread.
+ */
+ pthread_mutex_lock(&recorder_mutex);
+ recorder_count++;
+ pthread_cond_signal(&recorder_ready_cond);
+ pthread_mutex_unlock(&recorder_mutex);
+ }
+ return fra;
+}
+
+static void *synthesize_raw_data(struct perf_evsel *evsel)
+{
+ void *data = NULL;
+ u32 data_size;
+
+ if (!strcmp(evsel->tp_format->name, "function")) {
+ struct {
+ unsigned short common_type;
+ unsigned char common_flags;
+ unsigned char common_preempt_count;
+ int common_pid;
+ int common_padding;
+
+ unsigned long ip;
+ unsigned long parent_ip;
+ } function_format = {
+ .common_type = evsel->attr.config,
+ };
+
+ data_size = sizeof(function_format);
+
+ data = malloc(data_size + sizeof(u32));
+ if (data == NULL)
+ return NULL;
+
+ memcpy(data, &data_size, sizeof(data_size));
+
+ data += sizeof(data_size);
+ memcpy(data, &function_format, sizeof(function_format));
+ } else if (!strcmp(evsel->tp_format->name, "funcgraph_entry")) {
+ struct {
+ unsigned short common_type;
+ unsigned char common_flags;
+ unsigned char common_preempt_count;
+ int common_pid;
+ int common_padding;
+
+ unsigned long func;
+ int depth;
+ } funcgraph_entry_format = {
+ .common_type = evsel->attr.config,
+ };
+
+ data_size = sizeof(funcgraph_entry_format);
+
+ data = malloc(data_size + sizeof(u32));
+ if (data == NULL)
+ return NULL;
+
+ memcpy(data, &data_size, sizeof(data_size));
+
+ data += sizeof(data_size);
+ memcpy(data, &funcgraph_entry_format,
+ sizeof(funcgraph_entry_format));
+ }
+ return data;
+}
+
+static int do_ftrace_record(struct perf_ftrace *ftrace)
+{
+ int i, err, feat;
+ int perf_fd;
+ LIST_HEAD(recorders);
+ struct perf_session *session;
+ struct ftrace_record_arg *fra, *tmp;
+
+ signal(SIGINT, sig_handler);
+ signal(SIGUSR1, sig_handler);
+ signal(SIGCHLD, sig_handler);
+
+ if (setup_tracing_files(ftrace) < 0)
+ goto out_reset;
+
+ alloc_ftrace_evsel(ftrace);
+
+ perf_fd = create_perf_directory(ftrace);
+ if (perf_fd < 0) {
+ pr_err("failed to create perf directory\n");
+ goto out_reset;
+ }
+
+ /* just use a dummy session for header recording */
+ session = zalloc(sizeof(*session));
+ if (session == NULL) {
+ pr_err("failed to allocate perf session\n");
+ goto out_close;
+ }
+ session->evlist = ftrace->evlist;
+
+ for (feat = HEADER_FIRST_FEATURE; feat < HEADER_LAST_FEATURE; feat++)
+ perf_header__set_feat(&session->header, feat);
+
+ perf_header__clear_feat(&session->header, HEADER_BUILD_ID);
+ perf_header__clear_feat(&session->header, HEADER_BRANCH_STACK);
+ perf_header__clear_feat(&session->header, HEADER_PMU_MAPPINGS);
+ perf_header__clear_feat(&session->header, HEADER_GROUP_DESC);
+
+ err = perf_session__write_header(session, ftrace->evlist,
+ perf_fd, false);
+ if (err < 0) {
+ pr_err("failed to write perf header\n");
+ goto out_session;
+ }
+
+ /*
+ * We record ftrace's per_cpu buffer so that it'd better having
+ * corresponding cpu maps anyway.
+ */
+ if (!perf_target__has_cpu(&ftrace->target)) {
+ struct cpu_map *new_map;
+
+ new_map = cpu_map__new(NULL);
+ if (new_map == NULL) {
+ pr_err("failed to create new cpu map\n");
+ goto out_session;
+ }
+
+ /* replace existing cpu map */
+ cpu_map__delete(ftrace->evlist->cpus);
+ ftrace->evlist->cpus = new_map;
+ }
+
+ for (i = 0; i < cpu_map__nr(ftrace->evlist->cpus); i++) {
+ fra = malloc(sizeof(*fra));
+ if (fra == NULL) {
+ pr_err("not enough memory!\n");
+ goto out_fra;
+ }
+
+ fra->ftrace = ftrace;
+ fra->cpu = ftrace->evlist->cpus->map[i];
+ fra->state = RECORD_STATE__INIT;
+ list_add_tail(&fra->node, &recorders);
+
+ err = pthread_create(&fra->id, NULL,
+ record_ftrace_raw_buffer, fra);
+ if (err < 0) {
+ pr_err("failed to create recorder thread\n");
+ goto out_fra;
+ }
+ }
+
+ /* wait for all recorders ready */
+ pthread_mutex_lock(&recorder_mutex);
+ while (recorder_count != cpu_map__nr(ftrace->evlist->cpus))
+ pthread_cond_wait(&recorder_ready_cond, &recorder_mutex);
+ pthread_mutex_unlock(&recorder_mutex);
+
+ list_for_each_entry(fra, &recorders, node) {
+ if (fra->state != RECORD_STATE__READY) {
+ pr_err("cpu%d: failed to start recorder", fra->cpu);
+ goto out_fra;
+ }
+ }
+
+ if (write_tracing_file("tracing_on", "1") < 0) {
+ pr_err("can't enable tracing\n");
+ goto out_fra;
+ }
+
+ perf_evlist__start_workload(ftrace->evlist);
+
+ pr_debug2("start recording per cpu buffers\n");
+ pthread_mutex_lock(&recorder_mutex);
+ pthread_cond_broadcast(&recorder_start_cond);
+ pthread_mutex_unlock(&recorder_mutex);
+
+ /* wait for signal/finish */
+ pause();
+
+ if (write_tracing_file("tracing_on", "0") < 0) {
+ pr_err("can't disable tracing\n");
+ goto out_fra;
+ }
+
+ /* signal recorders to terminate */
+ list_for_each_entry(fra, &recorders, node) {
+ pr_debug2("killing recorder thread for cpu%d\n", fra->cpu);
+ pthread_kill(fra->id, SIGUSR2);
+ }
+
+ list_for_each_entry(fra, &recorders, node)
+ pthread_join(fra->id, NULL);
+
+ /* synthesize sample data */
+ list_for_each_entry(fra, &recorders, node) {
+ struct perf_evsel *evsel = perf_evlist__first(ftrace->evlist);
+ union perf_event event = {
+ .sample = {
+ .header = {
+ .type = PERF_RECORD_SAMPLE,
+ .misc = PERF_RECORD_MISC_KERNEL,
+ .size = sizeof(event.sample.header) +
+ evsel->sample_size,
+ },
+ },
+ };
+ struct perf_sample sample = {
+ .cpu = fra->cpu,
+ .period = 1,
+ };
+ void *raw_data;
+ u32 raw_size;
+ int orig_size;
+
+ if (fra->state != RECORD_STATE__DONE) {
+ pr_warning("recorder failed for some reason on cpu%d\n",
+ fra->cpu);
+ continue;
+ }
+
+ perf_event__synthesize_sample(&event, evsel->attr.sample_type,
+ &sample, false);
+
+ raw_data = synthesize_raw_data(evsel);
+ if (raw_data == NULL) {
+ pr_err("synthesizing raw sample failed\n");
+ goto out_fra;
+ }
+
+ /*
+ * start of raw data is the size of raw data excluding itself.
+ */
+ raw_size = sizeof(u32) + (*(u32 *) raw_data);
+
+ orig_size = event.sample.header.size;
+ event.sample.header.size += raw_size;
+
+ err = write(perf_fd, &event.sample, orig_size);
+ if (err != orig_size) {
+ pr_err("write error occurred\n");
+ goto out_fra;
+ }
+
+ err = write(perf_fd, raw_data, raw_size);
+ if (err != (int)raw_size) {
+ pr_err("write error occurred\n");
+ goto out_fra;
+ }
+
+ session->header.data_size += event.sample.header.size;
+ }
+
+ perf_session__write_header(session, ftrace->evlist, perf_fd, true);
+
+out_fra:
+ list_for_each_entry_safe(fra, tmp, &recorders, node) {
+ list_del(&fra->node);
+ free(fra);
+ }
+out_session:
+ free(session);
+out_close:
+ close(perf_fd);
+out_reset:
+ reset_tracing_files(ftrace);
return done ? 0 : -1;
}

@@ -286,6 +812,74 @@ out:
return ret;
}

+static int
+__cmd_ftrace_record(struct perf_ftrace *ftrace, int argc, const char **argv)
+{
+ int ret = -1;
+ const char * const record_usage[] = {
+ "perf ftrace record [<options>] [<command>]",
+ "perf ftrace record [<options>] -- <command> [<options>]",
+ NULL
+ };
+ const struct option record_options[] = {
+ OPT_STRING('t', "tracer", &ftrace->tracer, "tracer",
+ "tracer to use"),
+ OPT_STRING('p', "pid", &ftrace->target.tid, "pid",
+ "trace on existing process id"),
+ OPT_INCR('v', "verbose", &verbose,
+ "be more verbose"),
+ OPT_BOOLEAN('a', "all-cpus", &ftrace->target.system_wide,
+ "system-wide collection from all CPUs"),
+ OPT_STRING('C', "cpu", &ftrace->target.cpu_list, "cpu",
+ "list of cpus to monitor"),
+ OPT_STRING('o', "output", &ftrace->dirname, "dirname",
+ "input directory name to use (default: perf.data)"),
+ OPT_END()
+ };
+
+ argc = parse_options(argc, argv, record_options, record_usage,
+ PARSE_OPT_STOP_AT_NON_OPTION);
+ if (!argc && perf_target__none(&ftrace->target))
+ usage_with_options(record_usage, record_options);
+
+ ret = perf_target__validate(&ftrace->target);
+ if (ret) {
+ char errbuf[512];
+
+ perf_target__strerror(&ftrace->target, ret, errbuf, 512);
+ pr_err("%s\n", errbuf);
+ return -EINVAL;
+ }
+
+ ftrace->evlist = perf_evlist__new();
+ if (ftrace->evlist == NULL)
+ return -ENOMEM;
+
+ ret = perf_evlist__create_maps(ftrace->evlist, &ftrace->target);
+ if (ret < 0)
+ goto out;
+
+ if (ftrace->tracer == NULL)
+ ftrace->tracer = DEFAULT_TRACER;
+
+ if (ftrace->dirname == NULL)
+ ftrace->dirname = DEFAULT_DIRNAME;
+
+ if (argc && perf_evlist__prepare_workload(ftrace->evlist,
+ &ftrace->target,
+ argv, false, true) < 0)
+ goto out_maps;
+
+ ret = do_ftrace_record(ftrace);
+
+out_maps:
+ perf_evlist__delete_maps(ftrace->evlist);
+out:
+ perf_evlist__delete(ftrace->evlist);
+
+ return ret;
+}
+
int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
{
int ret;
@@ -293,8 +887,8 @@ int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
.target = { .uid = UINT_MAX, },
};
const char * const ftrace_usage[] = {
- "perf ftrace {live} [<options>] [<command>]",
- "perf ftrace {live} [<options>] -- <command> [<options>]",
+ "perf ftrace {live|record} [<options>] [<command>]",
+ "perf ftrace {live|record} [<options>] -- <command> [<options>]",
NULL
};
const struct option ftrace_options[] = {
@@ -313,6 +907,8 @@ int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)

if (strcmp(argv[0], "live") == 0) {
ret = __cmd_ftrace_live(&ftrace, argc, argv);
+ } else if (strncmp(argv[0], "rec", 3) == 0) {
+ ret = __cmd_ftrace_record(&ftrace, argc, argv);
} else {
usage_with_options(ftrace_usage, ftrace_options);
}
--
1.7.11.7

2013-04-23 08:32:32

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 10/14] perf ftrace: Add 'show' sub-command

From: Namhyung Kim <[email protected]>

The ftrace show subcommand is for viewing recorded ftrace files. It
enters perf.data.dir directory and open perf.header file to find out
necessary information. And then read out per-cpu trace records using
kbuffer helper and print them to stdout in time order.

It only shows its basic form so function graph doesn't show duration/
overhead and no leaf entry handling is provided yet. Maybe it can be
handled by a proper plugin in the libtraceevent.

Cc: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-ftrace.c | 427 +++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 425 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 7e72cd65dd0a..c0f62bd12bb2 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -13,6 +13,7 @@
#include <signal.h>
#include <fcntl.h>
#include <dirent.h>
+#include <sys/mman.h>

#include "util/debug.h"
#include "util/parse-options.h"
@@ -21,6 +22,8 @@
#include "util/thread_map.h"
#include "util/cpumap.h"
#include "util/trace-event.h"
+#include "../lib/traceevent/kbuffer.h"
+#include "../lib/traceevent/event-parse.h"


#define DEFAULT_TRACER "function_graph"
@@ -31,6 +34,7 @@ struct perf_ftrace {
struct perf_target target;
const char *tracer;
const char *dirname;
+ struct pevent *pevent;
};

static bool done;
@@ -750,6 +754,371 @@ out_reset:
}

static int
+function_handler(struct trace_seq *s, struct pevent_record *record,
+ struct event_format *event, void *context __maybe_unused)
+{
+ struct pevent *pevent = event->pevent;
+ unsigned long long function;
+ const char *func;
+
+ if (pevent_get_field_val(s, event, "ip", record, &function, 1))
+ return trace_seq_putc(s, '!');
+
+ func = pevent_find_function(pevent, function);
+ if (func)
+ trace_seq_printf(s, "%s <-- ", func);
+ else
+ trace_seq_printf(s, "0x%llx", function);
+
+ if (pevent_get_field_val(s, event, "parent_ip", record, &function, 1))
+ return trace_seq_putc(s, '!');
+
+ func = pevent_find_function(pevent, function);
+ if (func)
+ trace_seq_printf(s, "%s", func);
+ else
+ trace_seq_printf(s, "0x%llx", function);
+
+ trace_seq_putc(s, '\n');
+ return 0;
+}
+
+#define TRACE_GRAPH_INDENT 2
+
+static int
+fgraph_ent_handler(struct trace_seq *s, struct pevent_record *record,
+ struct event_format *event, void *context __maybe_unused)
+{
+ unsigned long long depth;
+ unsigned long long val;
+ const char *func;
+ int i;
+
+ if (pevent_get_field_val(s, event, "depth", record, &depth, 1))
+ return trace_seq_putc(s, '!');
+
+ /* Function */
+ for (i = 0; i < (int)(depth * TRACE_GRAPH_INDENT); i++)
+ trace_seq_putc(s, ' ');
+
+ if (pevent_get_field_val(s, event, "func", record, &val, 1))
+ return trace_seq_putc(s, '!');
+
+ func = pevent_find_function(event->pevent, val);
+
+ if (func)
+ trace_seq_printf(s, "%s() {", func);
+ else
+ trace_seq_printf(s, "%llx() {", val);
+
+ trace_seq_putc(s, '\n');
+ return 0;
+}
+
+static int
+fgraph_ret_handler(struct trace_seq *s, struct pevent_record *record,
+ struct event_format *event, void *context __maybe_unused)
+{
+ unsigned long long depth;
+ int i;
+
+ if (pevent_get_field_val(s, event, "depth", record, &depth, 1))
+ return trace_seq_putc(s, '!');
+
+ /* Function */
+ for (i = 0; i < (int)(depth * TRACE_GRAPH_INDENT); i++)
+ trace_seq_putc(s, ' ');
+
+ trace_seq_puts(s, "}\n");
+ return 0;
+}
+
+struct perf_ftrace_report {
+ struct perf_ftrace *ftrace;
+ struct perf_tool tool;
+};
+
+struct ftrace_report_arg {
+ struct list_head node;
+ struct pevent_record *record;
+ struct kbuffer *kbuf;
+ void *map;
+ int cpu;
+ int fd;
+ off_t offset;
+ off_t size;
+};
+
+static LIST_HEAD(ftrace_cpu_buffers);
+
+static int process_sample_show(struct perf_tool *tool,
+ union perf_event * event __maybe_unused,
+ struct perf_sample *sample,
+ struct perf_evsel *evsel __maybe_unused,
+ struct machine *machine __maybe_unused)
+{
+ struct perf_ftrace *ftrace;
+ struct perf_ftrace_report *show;
+ struct ftrace_report_arg *fra;
+ struct stat statbuf;
+ enum kbuffer_long_size long_size;
+ enum kbuffer_endian endian;
+ char buf[PATH_MAX];
+
+ show = container_of(tool, struct perf_ftrace_report, tool);
+ ftrace = show->ftrace;
+
+ fra = zalloc(sizeof(*fra));
+ if (fra == NULL)
+ return -1;
+
+ fra->cpu = sample->cpu;
+
+ scnprintf(buf, sizeof(buf), "%s.dir/trace-cpu%d.buf",
+ ftrace->dirname, fra->cpu);
+
+ fra->fd = open(buf, O_RDONLY);
+ if (fra->fd < 0)
+ goto out;
+
+ if (fstat(fra->fd, &statbuf) < 0)
+ goto out_close;
+
+ fra->size = statbuf.st_size;
+ if (fra->size == 0) {
+ /* skip zero-size buffers */
+ free(fra);
+ return 0;
+ }
+
+ /*
+ * FIXME: What if pevent->page_size is smaller than current page size?
+ */
+ fra->map = mmap(NULL, pevent_get_page_size(ftrace->pevent),
+ PROT_READ, MAP_PRIVATE, fra->fd, fra->offset);
+ if (fra->map == MAP_FAILED)
+ goto out_close;
+
+ fra->offset = 0;
+
+ if (pevent_is_file_bigendian(ftrace->pevent))
+ endian = KBUFFER_ENDIAN_BIG;
+ else
+ endian = KBUFFER_ENDIAN_LITTLE;
+
+ if (pevent_get_long_size(ftrace->pevent) == 8)
+ long_size = KBUFFER_LSIZE_8;
+ else
+ long_size = KBUFFER_LSIZE_4;
+
+ fra->kbuf = kbuffer_alloc(long_size, endian);
+ if (fra->kbuf == NULL)
+ goto out_unmap;
+
+ if (ftrace->pevent->old_format)
+ kbuffer_set_old_format(fra->kbuf);
+
+ kbuffer_load_subbuffer(fra->kbuf, fra->map);
+
+ pr_debug2("setup kbuffer for cpu%d\n", fra->cpu);
+ list_add_tail(&fra->node, &ftrace_cpu_buffers);
+ return 0;
+
+out_unmap:
+ munmap(fra->map, pevent_get_page_size(ftrace->pevent));
+out_close:
+ close(fra->fd);
+out:
+ free(fra);
+ return -1;
+}
+
+static struct pevent_record *
+get_next_ftrace_event_record(struct perf_ftrace *ftrace,
+ struct ftrace_report_arg *fra)
+{
+ struct pevent_record *record;
+ unsigned long long ts;
+ void *data;
+
+retry:
+ data = kbuffer_read_event(fra->kbuf, &ts);
+ if (data) {
+ record = zalloc(sizeof(*record));
+ if (record == NULL) {
+ pr_err("memory allocation failure\n");
+ return NULL;
+ }
+
+ record->ts = ts;
+ record->cpu = fra->cpu;
+ record->data = data;
+ record->size = kbuffer_event_size(fra->kbuf);
+ record->record_size = kbuffer_curr_size(fra->kbuf);
+ record->offset = kbuffer_curr_offset(fra->kbuf);
+ record->missed_events = kbuffer_missed_events(fra->kbuf);
+ record->ref_count = 1;
+
+ kbuffer_next_event(fra->kbuf, NULL);
+ return record;
+ }
+
+ munmap(fra->map, pevent_get_page_size(ftrace->pevent));
+ fra->map = NULL;
+
+ fra->offset += pevent_get_page_size(ftrace->pevent);
+ if (fra->offset >= fra->size) {
+ /* EOF */
+ return NULL;
+ }
+
+ fra->map = mmap(NULL, pevent_get_page_size(ftrace->pevent),
+ PROT_READ, MAP_PRIVATE, fra->fd, fra->offset);
+ if (fra->map == MAP_FAILED) {
+ pr_err("memory mapping failed\n");
+ return NULL;
+ }
+
+ kbuffer_load_subbuffer(fra->kbuf, fra->map);
+
+ goto retry;
+}
+
+static struct pevent_record *
+get_ftrace_event_record(struct perf_ftrace *ftrace,
+ struct ftrace_report_arg *fra)
+{
+ if (fra->record == NULL)
+ fra->record = get_next_ftrace_event_record(ftrace, fra);
+
+ return fra->record;
+}
+
+static struct pevent_record *get_ordered_record(struct perf_ftrace *ftrace)
+{
+ struct ftrace_report_arg *fra = NULL;
+ struct ftrace_report_arg *tmp;
+ struct pevent_record *record;
+ unsigned long long min_ts = LLONG_MAX;
+
+ list_for_each_entry(tmp, &ftrace_cpu_buffers, node) {
+ if (perf_target__has_cpu(&ftrace->target)) {
+ int i;
+ bool found = false;
+
+ for (i = 0; i < cpu_map__nr(ftrace->evlist->cpus); i++) {
+ if (tmp->cpu == ftrace->evlist->cpus->map[i]) {
+ found = true;
+ break;
+ }
+ }
+ if (!found)
+ continue;
+ }
+
+ record = get_ftrace_event_record(ftrace, tmp);
+ if (record && record->ts < min_ts) {
+ min_ts = record->ts;
+ fra = tmp;
+ }
+ }
+
+ if (fra) {
+ record = fra->record;
+ fra->record = NULL;
+ return record;
+ }
+ return NULL;
+}
+
+static void free_ftrace_report_args(struct perf_ftrace *ftrace)
+{
+ struct ftrace_report_arg *fra, *tmp;
+
+ list_for_each_entry_safe(fra, tmp, &ftrace_cpu_buffers, node) {
+ list_del(&fra->node);
+
+ /* don't care about the errors */
+ munmap(fra->map, pevent_get_page_size(ftrace->pevent));
+ kbuffer_free(fra->kbuf);
+ free(fra->record);
+ close(fra->fd);
+ free(fra);
+ }
+}
+
+static int do_ftrace_show(struct perf_ftrace *ftrace)
+{
+ int ret = 0;
+ char buf[PATH_MAX];
+ struct perf_session *session;
+ struct pevent_record *record;
+ struct trace_seq seq;
+ struct perf_ftrace_report show = {
+ .ftrace = ftrace,
+ .tool = {
+ .sample = process_sample_show,
+ },
+ };
+
+ scnprintf(buf, sizeof(buf), "%s.dir/perf.header", ftrace->dirname);
+
+ session = perf_session__new(buf, O_RDONLY, false, false, &show.tool);
+ if (session == NULL) {
+ pr_err("failed to create a session\n");
+ return -1;
+ }
+
+ ftrace->pevent = session->pevent;
+
+ pevent_register_event_handler(ftrace->pevent, -1,
+ "ftrace", "function",
+ function_handler, NULL);
+ pevent_register_event_handler(ftrace->pevent, -1,
+ "ftrace", "funcgraph_entry",
+ fgraph_ent_handler, NULL);
+ pevent_register_event_handler(ftrace->pevent, -1,
+ "ftrace", "funcgraph_exit",
+ fgraph_ret_handler, NULL);
+
+ if (perf_session__process_events(session, &show.tool) < 0) {
+ pr_err("failed to process events\n");
+ ret = -1;
+ goto out;
+ }
+
+ trace_seq_init(&seq);
+
+ record = get_ordered_record(ftrace);
+ while (record) {
+ int type;
+ struct event_format *event;
+
+ type = pevent_data_type(ftrace->pevent, record);
+ event = pevent_find_event(ftrace->pevent, type);
+ if (!event) {
+ pr_warning("no event found for type %d", type);
+ continue;
+ }
+
+ pevent_print_event(ftrace->pevent, &seq, record);
+ trace_seq_do_printf(&seq);
+
+ trace_seq_reset(&seq);
+
+ free(record);
+ record = get_ordered_record(ftrace);
+ }
+
+ trace_seq_destroy(&seq);
+
+out:
+ free_ftrace_report_args(ftrace);
+ perf_session__delete(session);
+ return ret;
+}
+
+static int
__cmd_ftrace_live(struct perf_ftrace *ftrace, int argc, const char **argv)
{
int ret = -1;
@@ -880,6 +1249,58 @@ out:
return ret;
}

+static int
+__cmd_ftrace_show(struct perf_ftrace *ftrace, int argc, const char **argv)
+{
+ int ret = -1;
+ const char * const show_usage[] = {
+ "perf ftrace show [<options>]",
+ NULL
+ };
+ const struct option show_options[] = {
+ OPT_STRING('i', "input", &ftrace->dirname, "dirname",
+ "input directory name to use (default: perf.data)"),
+ OPT_INCR('v', "verbose", &verbose,
+ "be more verbose"),
+ OPT_STRING('C', "cpu", &ftrace->target.cpu_list, "cpu",
+ "list of cpus to monitor"),
+ OPT_END()
+ };
+
+ argc = parse_options(argc, argv, show_options, show_usage,
+ PARSE_OPT_STOP_AT_NON_OPTION);
+ if (argc)
+ usage_with_options(show_usage, show_options);
+
+ ret = perf_target__validate(&ftrace->target);
+ if (ret) {
+ char errbuf[512];
+
+ perf_target__strerror(&ftrace->target, ret, errbuf, 512);
+ pr_err("%s\n", errbuf);
+ return -EINVAL;
+ }
+
+ ftrace->evlist = perf_evlist__new();
+ if (ftrace->evlist == NULL)
+ return -ENOMEM;
+
+ ret = perf_evlist__create_maps(ftrace->evlist, &ftrace->target);
+ if (ret < 0)
+ goto out;
+
+ if (ftrace->dirname == NULL)
+ ftrace->dirname = DEFAULT_DIRNAME;
+
+ ret = do_ftrace_show(ftrace);
+
+ perf_evlist__delete_maps(ftrace->evlist);
+out:
+ perf_evlist__delete(ftrace->evlist);
+
+ return ret;
+}
+
int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
{
int ret;
@@ -887,8 +1308,8 @@ int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
.target = { .uid = UINT_MAX, },
};
const char * const ftrace_usage[] = {
- "perf ftrace {live|record} [<options>] [<command>]",
- "perf ftrace {live|record} [<options>] -- <command> [<options>]",
+ "perf ftrace {live|record|show} [<options>] [<command>]",
+ "perf ftrace {live|record|show} [<options>] -- <command> [<options>]",
NULL
};
const struct option ftrace_options[] = {
@@ -909,6 +1330,8 @@ int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
ret = __cmd_ftrace_live(&ftrace, argc, argv);
} else if (strncmp(argv[0], "rec", 3) == 0) {
ret = __cmd_ftrace_record(&ftrace, argc, argv);
+ } else if (strcmp(argv[0], "show") == 0) {
+ ret = __cmd_ftrace_show(&ftrace, argc, argv);
} else {
usage_with_options(ftrace_usage, ftrace_options);
}
--
1.7.11.7

2013-04-23 08:32:29

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 12/14] perf ftrace: Use pager for displaying result

From: Namhyung Kim <[email protected]>

It's convenient to use pager when seeing many lines of result.

Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-ftrace.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 97293cee5597..f4efa1475cf9 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -188,6 +188,7 @@ static int do_ftrace_live(struct perf_ftrace *ftrace)
signal(SIGINT, sig_handler);
signal(SIGUSR1, sig_handler);
signal(SIGCHLD, sig_handler);
+ signal(SIGPIPE, sig_handler);

if (setup_tracing_files(ftrace) < 0)
goto out_reset;
@@ -1461,6 +1462,8 @@ __cmd_ftrace_live(struct perf_ftrace *ftrace, int argc, const char **argv)
argv, false, true) < 0)
goto out_maps;

+ setup_pager();
+
ret = do_ftrace_live(ftrace);

out_maps:
@@ -1582,6 +1585,8 @@ __cmd_ftrace_show(struct perf_ftrace *ftrace, int argc, const char **argv)
if (ftrace->dirname == NULL)
ftrace->dirname = DEFAULT_DIRNAME;

+ setup_pager();
+
ret = do_ftrace_show(ftrace);

perf_evlist__delete_maps(ftrace->evlist);
@@ -1643,6 +1648,7 @@ __cmd_ftrace_report(struct perf_ftrace *ftrace, int argc, const char **argv)
perf_hpp__column_enable(PERF_HPP__OVERHEAD);
perf_hpp__init();

+ setup_pager();
setup_sorting();

symbol_conf.exclude_other = false;
--
1.7.11.7

2013-04-23 08:32:26

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 11/14] perf ftrace: Add 'report' sub-command

From: Namhyung Kim <[email protected]>

The ftrace report command is for analyzing ftrace result as usual perf
report style. Internal processing of the ftrace buffer is similar to
the 'show' sub-command but it synthesizes necessary information like
thread, dso, map and symbol from saved trace info.

It currently count number of samples as a period and it can be
extended to use calltime of funcgraph_exit in the future.

Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-ftrace.c | 363 +++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 361 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index c0f62bd12bb2..97293cee5597 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -21,6 +21,7 @@
#include "util/target.h"
#include "util/thread_map.h"
#include "util/cpumap.h"
+#include "util/sort.h"
#include "util/trace-event.h"
#include "../lib/traceevent/kbuffer.h"
#include "../lib/traceevent/event-parse.h"
@@ -35,6 +36,7 @@ struct perf_ftrace {
const char *tracer;
const char *dirname;
struct pevent *pevent;
+ bool show_full_info;
};

static bool done;
@@ -1118,6 +1120,294 @@ out:
return ret;
}

+static int process_sample_report(struct perf_tool *tool,
+ union perf_event * event __maybe_unused,
+ struct perf_sample *sample,
+ struct perf_evsel *evsel __maybe_unused,
+ struct machine *machine __maybe_unused)
+{
+ struct perf_ftrace *ftrace;
+ struct perf_ftrace_report *report;
+ struct ftrace_report_arg *fra;
+ struct stat statbuf;
+ enum kbuffer_long_size long_size;
+ enum kbuffer_endian endian;
+ char buf[PATH_MAX];
+
+ report = container_of(tool, struct perf_ftrace_report, tool);
+ ftrace = report->ftrace;
+
+ fra = zalloc(sizeof(*fra));
+ if (fra == NULL)
+ return -1;
+
+ fra->cpu = sample->cpu;
+
+ scnprintf(buf, sizeof(buf), "%s.dir/trace-cpu%d.buf",
+ ftrace->dirname, fra->cpu);
+
+ fra->fd = open(buf, O_RDONLY);
+ if (fra->fd < 0)
+ goto out;
+
+ if (fstat(fra->fd, &statbuf) < 0)
+ goto out_close;
+
+ fra->size = statbuf.st_size;
+ if (fra->size == 0) {
+ /* skip zero-size buffers */
+ free(fra);
+ return 0;
+ }
+
+ /*
+ * FIXME: What if pevent->page_size is smaller than current page size?
+ */
+ fra->map = mmap(NULL, pevent_get_page_size(ftrace->pevent),
+ PROT_READ, MAP_PRIVATE, fra->fd, fra->offset);
+ if (fra->map == MAP_FAILED)
+ goto out_close;
+
+ fra->offset = 0;
+
+ if (pevent_is_file_bigendian(ftrace->pevent))
+ endian = KBUFFER_ENDIAN_BIG;
+ else
+ endian = KBUFFER_ENDIAN_LITTLE;
+
+ if (pevent_get_long_size(ftrace->pevent) == 8)
+ long_size = KBUFFER_LSIZE_8;
+ else
+ long_size = KBUFFER_LSIZE_4;
+
+ fra->kbuf = kbuffer_alloc(long_size, endian);
+ if (fra->kbuf == NULL)
+ goto out_unmap;
+
+ if (ftrace->pevent->old_format)
+ kbuffer_set_old_format(fra->kbuf);
+
+ kbuffer_load_subbuffer(fra->kbuf, fra->map);
+
+ pr_debug2("setup kbuffer for cpu%d\n", fra->cpu);
+ list_add_tail(&fra->node, &ftrace_cpu_buffers);
+ return 0;
+
+out_unmap:
+ munmap(fra->map, pevent_get_page_size(ftrace->pevent));
+out_close:
+ close(fra->fd);
+out:
+ free(fra);
+ return -1;
+}
+
+struct cmdline_list {
+ struct cmdline_list *next;
+ char *comm;
+ int pid;
+};
+
+struct func_list {
+ struct func_list *next;
+ unsigned long long addr;
+ char *func;
+ char *mod;
+};
+
+static int do_ftrace_report(struct perf_ftrace *ftrace)
+{
+ int ret = -1;
+ char buf[PATH_MAX];
+ unsigned long nr_samples;
+ struct perf_session *session;
+ struct perf_evsel *evsel;
+ struct pevent_record *record;
+ struct perf_ftrace_report report = {
+ .ftrace = ftrace,
+ .tool = {
+ .sample = process_sample_report,
+ },
+ };
+ struct cmdline_list *cmdline;
+ struct func_list *func;
+ struct machine *machine;
+ struct dso *dso;
+
+ scnprintf(buf, sizeof(buf), "%s.dir/perf.header", ftrace->dirname);
+
+ session = perf_session__new(buf, O_RDONLY, false, false, &report.tool);
+ if (session == NULL) {
+ pr_err("failed to create a session\n");
+ return -1;
+ }
+
+ ftrace->pevent = session->pevent;
+
+ if (perf_session__process_events(session, &report.tool) < 0) {
+ pr_err("failed to process events\n");
+ goto out;
+ }
+
+ machine = machines__findnew(&session->machines, HOST_KERNEL_ID);
+
+ /* Synthesize thread info from saved cmdlines */
+ cmdline = ftrace->pevent->cmdlist;
+ while (cmdline) {
+ struct thread *thread;
+
+ thread = machine__findnew_thread(machine, cmdline->pid);
+ if (thread && !thread->comm_set)
+ thread__set_comm(thread, cmdline->comm);
+
+ cmdline = cmdline->next;
+ }
+
+ /* Synthesize kernel dso and symbol info from saved kallsyms */
+ func = ftrace->pevent->funclist;
+ while (func) {
+ struct symbol *sym;
+
+ scnprintf(buf, sizeof(buf), "[%s]",
+ func->mod ? func->mod : "kernel.kallsyms");
+
+ dso = dso__kernel_findnew(machine, buf, NULL, DSO_TYPE_KERNEL);
+ if (dso == NULL) {
+ pr_debug("can't find or allocate dso %s\n", buf);
+ continue;
+ }
+
+ sym = symbol__new(func->addr, 0, STB_GLOBAL, func->func);
+ if (sym == NULL) {
+ pr_debug("failed to allocate new symbol\n");
+ continue;
+ }
+ symbols__insert(&dso->symbols[MAP__FUNCTION], sym);
+
+ func = func->next;
+ }
+
+ /* Generate kernel maps */
+ list_for_each_entry(dso, &machine->kernel_dsos, node) {
+ struct map *map = map__new2(0, dso, MAP__FUNCTION);
+ if (map == NULL) {
+ pr_debug("failed to allocate new map\n");
+ goto out;
+ }
+
+ symbols__fixup_end(&dso->symbols[MAP__FUNCTION]);
+ map__fixup_start(map);
+ map__fixup_end(map);
+
+ dso__set_loaded(dso, MAP__FUNCTION);
+
+ map_groups__insert(&machine->kmaps, map);
+ if (strcmp(dso->name, "[kernel.kallsyms]") == 0)
+ machine->vmlinux_maps[MAP__FUNCTION] = map;
+ }
+
+ /* FIXME: no need to get ordered */
+ record = get_ordered_record(ftrace);
+ while (record) {
+ int type;
+ struct addr_location al;
+ union perf_event event = {
+ .header = {
+ .misc = PERF_RECORD_MISC_KERNEL,
+ },
+ };
+ struct perf_sample sample = {
+ .cpu = record->cpu,
+ .raw_data = record->data,
+ .period = 1,
+ };
+ struct format_field *field;
+ unsigned long long val;
+
+ type = pevent_data_type(ftrace->pevent, record);
+ evsel = perf_evlist__find_tracepoint_by_id(session->evlist,
+ type);
+ if (evsel == NULL) {
+ pr_warning("no event found for type %d\n", type);
+ continue;
+ }
+
+ event.ip.pid = pevent_data_pid(ftrace->pevent, record);
+
+ if (!strcmp(perf_evsel__name(evsel), "ftrace:function"))
+ field = pevent_find_field(evsel->tp_format, "ip");
+ else
+ field = pevent_find_field(evsel->tp_format, "func");
+
+ if (pevent_read_number_field(field, record->data, &val) < 0) {
+ pr_err("failed to parse function address\n");
+ goto out;
+ }
+ event.ip.ip = val;
+
+ if (perf_event__preprocess_sample(&event, machine, &al,
+ &sample, NULL) < 0) {
+ pr_err("problem processing %d event, skipping it.\n",
+ event.header.type);
+ goto out;
+ }
+
+ /* TODO: update sample.period using calltime */
+ if (!__hists__add_entry(&evsel->hists, &al, NULL,
+ sample.period, 0)) {
+ pr_err("failed to add a hist entry\n");
+ goto out;
+ }
+
+ evsel->hists.stats.total_period += sample.period;
+ hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+
+ free(record);
+ record = get_ordered_record(ftrace);
+ }
+ ret = 0;
+
+ perf_session__fprintf_info(session, stdout, ftrace->show_full_info);
+
+ nr_samples = 0;
+ list_for_each_entry(evsel, &session->evlist->entries, node) {
+ struct hists *hists = &evsel->hists;
+
+ hists__collapse_resort(hists);
+ hists__output_resort(&evsel->hists);
+ nr_samples += hists->stats.nr_events[PERF_RECORD_SAMPLE];
+ }
+
+ if (nr_samples == 0) {
+ pr_warning("The %s file has no samples!\n", session->filename);
+ goto out;
+ }
+
+ list_for_each_entry(evsel, &session->evlist->entries, node) {
+ struct hists *hists = &evsel->hists;
+ const char *evname = perf_evsel__name(evsel);
+ u64 nr_events = hists->stats.total_period;
+ char unit;
+
+ nr_samples = hists->stats.nr_events[PERF_RECORD_SAMPLE];
+ nr_samples = convert_unit(nr_samples, &unit);
+ fprintf(stdout, "# Samples: %lu%c", nr_samples, unit);
+ if (evname != NULL)
+ fprintf(stdout, " of event '%s'", evname);
+
+ fprintf(stdout, "\n# Event count (approx.): %" PRIu64, nr_events);
+ fprintf(stdout, "\n#\n");
+
+ hists__fprintf(hists, true, 0, 0, stdout);
+ fprintf(stdout, "\n\n");
+ }
+
+out:
+ free_ftrace_report_args(ftrace);
+ perf_session__delete(session);
+ return ret;
+}
+
static int
__cmd_ftrace_live(struct perf_ftrace *ftrace, int argc, const char **argv)
{
@@ -1301,6 +1591,73 @@ out:
return ret;
}

+static int
+__cmd_ftrace_report(struct perf_ftrace *ftrace, int argc, const char **argv)
+{
+ int ret = -1;
+ const char * const report_usage[] = {
+ "perf ftrace report [<options>]",
+ NULL
+ };
+ const struct option report_options[] = {
+ OPT_STRING('i', "input", &ftrace->dirname, "dirname",
+ "input directory name to use (default: perf.data)"),
+ OPT_INCR('v', "verbose", &verbose,
+ "be more verbose"),
+ OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
+ "dump raw trace in ASCII"),
+ OPT_STRING('C', "cpu", &ftrace->target.cpu_list, "cpu",
+ "list of cpus to monitor"),
+ OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
+ "sort by key(s): pid, comm, dso, symbol, cpu"),
+ OPT_BOOLEAN('I', "show-info", &ftrace->show_full_info,
+ "Display extended information like cpu/numa topology"),
+ OPT_END()
+ };
+
+ argc = parse_options(argc, argv, report_options, report_usage,
+ PARSE_OPT_STOP_AT_NON_OPTION);
+ if (argc)
+ usage_with_options(report_usage, report_options);
+
+ ret = perf_target__validate(&ftrace->target);
+ if (ret) {
+ char errbuf[512];
+
+ perf_target__strerror(&ftrace->target, ret, errbuf, 512);
+ pr_err("%s\n", errbuf);
+ return -EINVAL;
+ }
+
+ ftrace->evlist = perf_evlist__new();
+ if (ftrace->evlist == NULL)
+ return -ENOMEM;
+
+ ret = perf_evlist__create_maps(ftrace->evlist, &ftrace->target);
+ if (ret < 0)
+ goto out;
+
+ if (ftrace->dirname == NULL)
+ ftrace->dirname = DEFAULT_DIRNAME;
+
+ perf_hpp__column_enable(PERF_HPP__OVERHEAD);
+ perf_hpp__init();
+
+ setup_sorting();
+
+ symbol_conf.exclude_other = false;
+ symbol_conf.try_vmlinux_path = false;
+ symbol__init();
+
+ ret = do_ftrace_report(ftrace);
+
+ perf_evlist__delete_maps(ftrace->evlist);
+out:
+ perf_evlist__delete(ftrace->evlist);
+
+ return ret;
+}
+
int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
{
int ret;
@@ -1308,8 +1665,8 @@ int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
.target = { .uid = UINT_MAX, },
};
const char * const ftrace_usage[] = {
- "perf ftrace {live|record|show} [<options>] [<command>]",
- "perf ftrace {live|record|show} [<options>] -- <command> [<options>]",
+ "perf ftrace {live|record|show|report} [<options>] [<command>]",
+ "perf ftrace {live|record|show|report} [<options>] -- <command> [<options>]",
NULL
};
const struct option ftrace_options[] = {
@@ -1332,6 +1689,8 @@ int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
ret = __cmd_ftrace_record(&ftrace, argc, argv);
} else if (strcmp(argv[0], "show") == 0) {
ret = __cmd_ftrace_show(&ftrace, argc, argv);
+ } else if (strncmp(argv[0], "rep", 3) == 0) {
+ ret = __cmd_ftrace_report(&ftrace, argc, argv);
} else {
usage_with_options(ftrace_usage, ftrace_options);
}
--
1.7.11.7

2013-04-23 08:34:02

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 06/14] perf ftrace: Add support for --pid option

From: Namhyung Kim <[email protected]>

The -p (--pid) option enables to trace existing process by its pid.

Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-ftrace.c | 89 ++++++++++++++++++++++++++++++++-------------
1 file changed, 63 insertions(+), 26 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index a3ad246bbf54..2025161bbaed 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -11,6 +11,7 @@

#include <unistd.h>
#include <signal.h>
+#include <fcntl.h>

#include "util/debug.h"
#include "util/parse-options.h"
@@ -34,11 +35,12 @@ static void sig_handler(int sig __maybe_unused)
done = true;
}

-static int write_tracing_file(const char *name, const char *val)
+static int __write_tracing_file(const char *name, const char *val, bool append)
{
char *file;
int fd, ret = -1;
ssize_t size = strlen(val);
+ int flags = O_WRONLY;

file = get_tracing_file(name);
if (!file) {
@@ -46,7 +48,12 @@ static int write_tracing_file(const char *name, const char *val)
return -1;
}

- fd = open(file, O_WRONLY);
+ if (append)
+ flags |= O_APPEND;
+ else
+ flags |= O_TRUNC;
+
+ fd = open(file, flags);
if (fd < 0) {
pr_debug("cannot open tracing file: %s\n", name);
goto out;
@@ -62,6 +69,16 @@ out:
return ret;
}

+static int write_tracing_file(const char *name, const char *val)
+{
+ return __write_tracing_file(name, val, false);
+}
+
+static int append_tracing_file(const char *name, const char *val)
+{
+ return __write_tracing_file(name, val, true);
+}
+
static int reset_tracing_files(struct perf_ftrace *ftrace __maybe_unused)
{
if (write_tracing_file("tracing_on", "0") < 0)
@@ -76,11 +93,27 @@ static int reset_tracing_files(struct perf_ftrace *ftrace __maybe_unused)
return 0;
}

+static int set_tracing_pid(struct perf_ftrace *ftrace)
+{
+ int i;
+ char buf[16];
+
+ if (perf_target__has_cpu(&ftrace->target))
+ return 0;
+
+ for (i = 0; i < thread_map__nr(ftrace->evlist->threads); i++) {
+ scnprintf(buf, sizeof(buf), "%d",
+ ftrace->evlist->threads->map[i]);
+ if (append_tracing_file("set_ftrace_pid", buf) < 0)
+ return -1;
+ }
+ return 0;
+}
+
static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)
{
char *trace_file;
int trace_fd;
- char *trace_pid;
char buf[4096];
struct pollfd pollfd = {
.events = POLLIN,
@@ -91,42 +124,37 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)
return -1;
}

- if (argc < 1)
- return -1;
-
signal(SIGINT, sig_handler);
signal(SIGUSR1, sig_handler);
signal(SIGCHLD, sig_handler);

- reset_tracing_files(ftrace);
+ if (reset_tracing_files(ftrace) < 0)
+ goto out;

/* reset ftrace buffer */
if (write_tracing_file("trace", "0") < 0)
goto out;

- if (perf_evlist__prepare_workload(ftrace->evlist, &ftrace->target,
- argv, false, true) < 0)
- goto out;
-
- if (write_tracing_file("current_tracer", ftrace->tracer) < 0) {
- pr_err("failed to set current_tracer to %s\n", ftrace->tracer);
+ if (argc && perf_evlist__prepare_workload(ftrace->evlist,
+ &ftrace->target,
+ argv, false, true) < 0) {
goto out;
}

- if (asprintf(&trace_pid, "%d", ftrace->evlist->threads->map[0]) < 0) {
- pr_err("failed to allocate pid string\n");
- goto out;
+ if (set_tracing_pid(ftrace) < 0) {
+ pr_err("failed to set ftrace pid\n");
+ goto out_reset;
}

- if (write_tracing_file("set_ftrace_pid", trace_pid) < 0) {
- pr_err("failed to set pid: %s\n", trace_pid);
- goto out_free_pid;
+ if (write_tracing_file("current_tracer", ftrace->tracer) < 0) {
+ pr_err("failed to set current_tracer to %s\n", ftrace->tracer);
+ goto out_reset;
}

trace_file = get_tracing_file("trace_pipe");
if (!trace_file) {
pr_err("failed to open trace_pipe\n");
- goto out_free_pid;
+ goto out_reset;
}

trace_fd = open(trace_file, O_RDONLY);
@@ -135,7 +163,7 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)

if (trace_fd < 0) {
pr_err("failed to open trace_pipe\n");
- goto out_free_pid;
+ goto out_reset;
}

fcntl(trace_fd, F_SETFL, O_NONBLOCK);
@@ -174,11 +202,9 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)

out_close_fd:
close(trace_fd);
-out_free_pid:
- free(trace_pid);
-out:
+out_reset:
reset_tracing_files(ftrace);
-
+out:
return done ? 0 : -1;
}

@@ -196,6 +222,8 @@ int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
const struct option ftrace_options[] = {
OPT_STRING('t', "tracer", &ftrace.tracer, "tracer",
"tracer to use"),
+ OPT_STRING('p', "pid", &ftrace.target.tid, "pid",
+ "trace on existing process id"),
OPT_INCR('v', "verbose", &verbose,
"be more verbose"),
OPT_END()
@@ -203,9 +231,18 @@ int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)

argc = parse_options(argc, argv, ftrace_options, ftrace_usage,
PARSE_OPT_STOP_AT_NON_OPTION);
- if (!argc)
+ if (!argc && perf_target__none(&ftrace.target))
usage_with_options(ftrace_usage, ftrace_options);

+ ret = perf_target__validate(&ftrace.target);
+ if (ret) {
+ char errbuf[512];
+
+ perf_target__strerror(&ftrace.target, ret, errbuf, 512);
+ pr_err("%s\n", errbuf);
+ return -EINVAL;
+ }
+
ftrace.evlist = perf_evlist__new();
if (ftrace.evlist == NULL)
return -ENOMEM;
--
1.7.11.7

2013-04-23 08:34:05

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 05/14] perf tools: Introduce new 'ftrace' tool

From: Namhyung Kim <[email protected]>

The ftrace command is a simple wrapper of kernel's ftrace
functionality. It only supports single thread tracing currently and
just reads trace_pipe in text and then write it to stdout.

Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/Makefile | 1 +
tools/perf/builtin-ftrace.c | 227 ++++++++++++++++++++++++++++++++++++++++++++
tools/perf/builtin.h | 1 +
tools/perf/command-list.txt | 1 +
tools/perf/perf.c | 1 +
5 files changed, 231 insertions(+)
create mode 100644 tools/perf/builtin-ftrace.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index b0f164b133d9..7c03e9d0e560 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -552,6 +552,7 @@ BUILTIN_OBJS += $(OUTPUT)builtin-kmem.o
BUILTIN_OBJS += $(OUTPUT)builtin-lock.o
BUILTIN_OBJS += $(OUTPUT)builtin-kvm.o
BUILTIN_OBJS += $(OUTPUT)builtin-inject.o
+BUILTIN_OBJS += $(OUTPUT)builtin-ftrace.o
BUILTIN_OBJS += $(OUTPUT)tests/builtin-test.o
BUILTIN_OBJS += $(OUTPUT)builtin-mem.o

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
new file mode 100644
index 000000000000..a3ad246bbf54
--- /dev/null
+++ b/tools/perf/builtin-ftrace.c
@@ -0,0 +1,227 @@
+/*
+ * builtin-ftrace.c
+ *
+ * Copyright (c) 2013 LG Electronics, Namhyung Kim <[email protected]>
+ *
+ * Released under the GPL v2.
+ */
+
+#include "builtin.h"
+#include "perf.h"
+
+#include <unistd.h>
+#include <signal.h>
+
+#include "util/debug.h"
+#include "util/parse-options.h"
+#include "util/evlist.h"
+#include "util/target.h"
+#include "util/thread_map.h"
+
+
+#define DEFAULT_TRACER "function_graph"
+
+struct perf_ftrace {
+ struct perf_evlist *evlist;
+ struct perf_target target;
+ const char *tracer;
+};
+
+static bool done;
+
+static void sig_handler(int sig __maybe_unused)
+{
+ done = true;
+}
+
+static int write_tracing_file(const char *name, const char *val)
+{
+ char *file;
+ int fd, ret = -1;
+ ssize_t size = strlen(val);
+
+ file = get_tracing_file(name);
+ if (!file) {
+ pr_debug("cannot get tracing file: %s\n", name);
+ return -1;
+ }
+
+ fd = open(file, O_WRONLY);
+ if (fd < 0) {
+ pr_debug("cannot open tracing file: %s\n", name);
+ goto out;
+ }
+
+ if (write(fd, val, size) == size)
+ ret = 0;
+ else
+ pr_debug("write '%s' to tracing/%s failed\n", val, name);
+
+out:
+ put_tracing_file(file);
+ return ret;
+}
+
+static int reset_tracing_files(struct perf_ftrace *ftrace __maybe_unused)
+{
+ if (write_tracing_file("tracing_on", "0") < 0)
+ return -1;
+
+ if (write_tracing_file("current_tracer", "nop") < 0)
+ return -1;
+
+ if (write_tracing_file("set_ftrace_pid", " ") < 0)
+ return -1;
+
+ return 0;
+}
+
+static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)
+{
+ char *trace_file;
+ int trace_fd;
+ char *trace_pid;
+ char buf[4096];
+ struct pollfd pollfd = {
+ .events = POLLIN,
+ };
+
+ if (geteuid() != 0) {
+ pr_err("ftrace only works for root!\n");
+ return -1;
+ }
+
+ if (argc < 1)
+ return -1;
+
+ signal(SIGINT, sig_handler);
+ signal(SIGUSR1, sig_handler);
+ signal(SIGCHLD, sig_handler);
+
+ reset_tracing_files(ftrace);
+
+ /* reset ftrace buffer */
+ if (write_tracing_file("trace", "0") < 0)
+ goto out;
+
+ if (perf_evlist__prepare_workload(ftrace->evlist, &ftrace->target,
+ argv, false, true) < 0)
+ goto out;
+
+ if (write_tracing_file("current_tracer", ftrace->tracer) < 0) {
+ pr_err("failed to set current_tracer to %s\n", ftrace->tracer);
+ goto out;
+ }
+
+ if (asprintf(&trace_pid, "%d", ftrace->evlist->threads->map[0]) < 0) {
+ pr_err("failed to allocate pid string\n");
+ goto out;
+ }
+
+ if (write_tracing_file("set_ftrace_pid", trace_pid) < 0) {
+ pr_err("failed to set pid: %s\n", trace_pid);
+ goto out_free_pid;
+ }
+
+ trace_file = get_tracing_file("trace_pipe");
+ if (!trace_file) {
+ pr_err("failed to open trace_pipe\n");
+ goto out_free_pid;
+ }
+
+ trace_fd = open(trace_file, O_RDONLY);
+
+ put_tracing_file(trace_file);
+
+ if (trace_fd < 0) {
+ pr_err("failed to open trace_pipe\n");
+ goto out_free_pid;
+ }
+
+ fcntl(trace_fd, F_SETFL, O_NONBLOCK);
+ pollfd.fd = trace_fd;
+
+ if (write_tracing_file("tracing_on", "1") < 0) {
+ pr_err("can't enable tracing\n");
+ goto out_close_fd;
+ }
+
+ perf_evlist__start_workload(ftrace->evlist);
+
+ while (!done) {
+ if (poll(&pollfd, 1, -1) < 0)
+ break;
+
+ if (pollfd.revents & POLLIN) {
+ int n = read(trace_fd, buf, sizeof(buf));
+ if (n < 0)
+ break;
+ if (fwrite(buf, n, 1, stdout) != 1)
+ break;
+ }
+ }
+
+ write_tracing_file("tracing_on", "0");
+
+ /* read remaining buffer contents */
+ while (true) {
+ int n = read(trace_fd, buf, sizeof(buf));
+ if (n <= 0)
+ break;
+ if (fwrite(buf, n, 1, stdout) != 1)
+ break;
+ }
+
+out_close_fd:
+ close(trace_fd);
+out_free_pid:
+ free(trace_pid);
+out:
+ reset_tracing_files(ftrace);
+
+ return done ? 0 : -1;
+}
+
+int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
+{
+ int ret;
+ struct perf_ftrace ftrace = {
+ .target = { .uid = UINT_MAX, },
+ };
+ const char * const ftrace_usage[] = {
+ "perf ftrace [<options>] [<command>]",
+ "perf ftrace [<options>] -- <command> [<options>]",
+ NULL
+ };
+ const struct option ftrace_options[] = {
+ OPT_STRING('t', "tracer", &ftrace.tracer, "tracer",
+ "tracer to use"),
+ OPT_INCR('v', "verbose", &verbose,
+ "be more verbose"),
+ OPT_END()
+ };
+
+ argc = parse_options(argc, argv, ftrace_options, ftrace_usage,
+ PARSE_OPT_STOP_AT_NON_OPTION);
+ if (!argc)
+ usage_with_options(ftrace_usage, ftrace_options);
+
+ ftrace.evlist = perf_evlist__new();
+ if (ftrace.evlist == NULL)
+ return -ENOMEM;
+
+ ret = perf_evlist__create_maps(ftrace.evlist, &ftrace.target);
+ if (ret < 0)
+ goto out_delete_evlist;
+
+ if (ftrace.tracer == NULL)
+ ftrace.tracer = DEFAULT_TRACER;
+
+ ret = __cmd_ftrace(&ftrace, argc, argv);
+
+ perf_evlist__delete_maps(ftrace.evlist);
+out_delete_evlist:
+ perf_evlist__delete(ftrace.evlist);
+
+ return ret;
+}
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index b210d62907e4..55da75c4e990 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -37,6 +37,7 @@ extern int cmd_test(int argc, const char **argv, const char *prefix);
extern int cmd_trace(int argc, const char **argv, const char *prefix);
extern int cmd_inject(int argc, const char **argv, const char *prefix);
extern int cmd_mem(int argc, const char **argv, const char *prefix);
+extern int cmd_ftrace(int argc, const char **argv, const char *prefix);

extern int find_scripts(char **scripts_array, char **scripts_path_array);
#endif
diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt
index 0906fc401c52..c24d76e61bc1 100644
--- a/tools/perf/command-list.txt
+++ b/tools/perf/command-list.txt
@@ -9,6 +9,7 @@ perf-buildid-cache mainporcelain common
perf-buildid-list mainporcelain common
perf-diff mainporcelain common
perf-evlist mainporcelain common
+perf-ftrace mainporcelain common
perf-inject mainporcelain common
perf-kmem mainporcelain common
perf-kvm mainporcelain common
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index 85e1aed95204..8966f4b718b8 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -61,6 +61,7 @@ static struct cmd_struct commands[] = {
#endif
{ "inject", cmd_inject, 0 },
{ "mem", cmd_mem, 0 },
+ { "ftrace", cmd_ftrace, 0 },
};

struct pager_config {
--
1.7.11.7

2013-04-23 08:34:00

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 07/14] perf ftrace: Add support for -a and -C option

From: Namhyung Kim <[email protected]>

The -a/--all-cpus and -C/--cpu option is for controlling tracing cpus.

Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-ftrace.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 2025161bbaed..04ca2303b63b 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -18,6 +18,7 @@
#include "util/evlist.h"
#include "util/target.h"
#include "util/thread_map.h"
+#include "util/cpumap.h"


#define DEFAULT_TRACER "function_graph"
@@ -110,6 +111,26 @@ static int set_tracing_pid(struct perf_ftrace *ftrace)
return 0;
}

+static int set_tracing_cpu(struct perf_ftrace *ftrace)
+{
+ int i;
+ unsigned long cpumask = 0;
+ char cpumask_str[sizeof(cpumask)*2 + 1];
+
+ if (!perf_target__has_cpu(&ftrace->target))
+ return 0;
+
+ for (i = 0; i < cpu_map__nr(ftrace->evlist->cpus); i++)
+ cpumask |= 1 << ftrace->evlist->cpus->map[i];
+
+ scnprintf(cpumask_str, sizeof(cpumask_str), "%lx", cpumask);
+
+ if (write_tracing_file("tracing_cpumask", cpumask_str) < 0)
+ return -1;
+
+ return 0;
+}
+
static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)
{
char *trace_file;
@@ -146,6 +167,11 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)
goto out_reset;
}

+ if (set_tracing_cpu(ftrace) < 0) {
+ pr_err("failed to set tracing cpumask\n");
+ goto out_reset;
+ }
+
if (write_tracing_file("current_tracer", ftrace->tracer) < 0) {
pr_err("failed to set current_tracer to %s\n", ftrace->tracer);
goto out_reset;
@@ -226,6 +252,10 @@ int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
"trace on existing process id"),
OPT_INCR('v', "verbose", &verbose,
"be more verbose"),
+ OPT_BOOLEAN('a', "all-cpus", &ftrace.target.system_wide,
+ "system-wide collection from all CPUs"),
+ OPT_STRING('C', "cpu", &ftrace.target.cpu_list, "cpu",
+ "list of cpus to monitor"),
OPT_END()
};

--
1.7.11.7

2013-04-23 08:33:59

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 08/14] perf ftrace: Split "live" sub-command

From: Namhyung Kim <[email protected]>

Separate out the default behavior to "live" subcommand. It's a
preparation to support more subcommands like "record" and "report".

Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-ftrace.c | 109 +++++++++++++++++++++++++++-----------------
1 file changed, 68 insertions(+), 41 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 04ca2303b63b..6d991c44948f 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -131,7 +131,7 @@ static int set_tracing_cpu(struct perf_ftrace *ftrace)
return 0;
}

-static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)
+static int do_ftrace_live(struct perf_ftrace *ftrace)
{
char *trace_file;
int trace_fd;
@@ -140,11 +140,6 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)
.events = POLLIN,
};

- if (geteuid() != 0) {
- pr_err("ftrace only works for root!\n");
- return -1;
- }
-
signal(SIGINT, sig_handler);
signal(SIGUSR1, sig_handler);
signal(SIGCHLD, sig_handler);
@@ -156,12 +151,6 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)
if (write_tracing_file("trace", "0") < 0)
goto out;

- if (argc && perf_evlist__prepare_workload(ftrace->evlist,
- &ftrace->target,
- argv, false, true) < 0) {
- goto out;
- }
-
if (set_tracing_pid(ftrace) < 0) {
pr_err("failed to set ftrace pid\n");
goto out_reset;
@@ -234,61 +223,99 @@ out:
return done ? 0 : -1;
}

-int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
+static int
+__cmd_ftrace_live(struct perf_ftrace *ftrace, int argc, const char **argv)
{
- int ret;
- struct perf_ftrace ftrace = {
- .target = { .uid = UINT_MAX, },
- };
- const char * const ftrace_usage[] = {
- "perf ftrace [<options>] [<command>]",
- "perf ftrace [<options>] -- <command> [<options>]",
+ int ret = -1;
+ const char * const live_usage[] = {
+ "perf ftrace live [<options>] [<command>]",
+ "perf ftrace live [<options>] -- <command> [<options>]",
NULL
};
- const struct option ftrace_options[] = {
- OPT_STRING('t', "tracer", &ftrace.tracer, "tracer",
+ const struct option live_options[] = {
+ OPT_STRING('t', "tracer", &ftrace->tracer, "tracer",
"tracer to use"),
- OPT_STRING('p', "pid", &ftrace.target.tid, "pid",
+ OPT_STRING('p', "pid", &ftrace->target.tid, "pid",
"trace on existing process id"),
OPT_INCR('v', "verbose", &verbose,
"be more verbose"),
- OPT_BOOLEAN('a', "all-cpus", &ftrace.target.system_wide,
+ OPT_BOOLEAN('a', "all-cpus", &ftrace->target.system_wide,
"system-wide collection from all CPUs"),
- OPT_STRING('C', "cpu", &ftrace.target.cpu_list, "cpu",
+ OPT_STRING('C', "cpu", &ftrace->target.cpu_list, "cpu",
"list of cpus to monitor"),
OPT_END()
};

- argc = parse_options(argc, argv, ftrace_options, ftrace_usage,
- PARSE_OPT_STOP_AT_NON_OPTION);
- if (!argc && perf_target__none(&ftrace.target))
- usage_with_options(ftrace_usage, ftrace_options);
+ argc = parse_options(argc, argv, live_options, live_usage,
+ PARSE_OPT_STOP_AT_NON_OPTION);
+ if (!argc && perf_target__none(&ftrace->target))
+ usage_with_options(live_usage, live_options);

- ret = perf_target__validate(&ftrace.target);
+ ret = perf_target__validate(&ftrace->target);
if (ret) {
char errbuf[512];

- perf_target__strerror(&ftrace.target, ret, errbuf, 512);
+ perf_target__strerror(&ftrace->target, ret, errbuf, 512);
pr_err("%s\n", errbuf);
return -EINVAL;
}

- ftrace.evlist = perf_evlist__new();
- if (ftrace.evlist == NULL)
+ ftrace->evlist = perf_evlist__new();
+ if (ftrace->evlist == NULL)
return -ENOMEM;

- ret = perf_evlist__create_maps(ftrace.evlist, &ftrace.target);
+ ret = perf_evlist__create_maps(ftrace->evlist, &ftrace->target);
if (ret < 0)
- goto out_delete_evlist;
+ goto out;

- if (ftrace.tracer == NULL)
- ftrace.tracer = DEFAULT_TRACER;
+ if (ftrace->tracer == NULL)
+ ftrace->tracer = DEFAULT_TRACER;

- ret = __cmd_ftrace(&ftrace, argc, argv);
+ if (argc && perf_evlist__prepare_workload(ftrace->evlist,
+ &ftrace->target,
+ argv, false, true) < 0)
+ goto out_maps;
+
+ ret = do_ftrace_live(ftrace);

- perf_evlist__delete_maps(ftrace.evlist);
-out_delete_evlist:
- perf_evlist__delete(ftrace.evlist);
+out_maps:
+ perf_evlist__delete_maps(ftrace->evlist);
+out:
+ perf_evlist__delete(ftrace->evlist);
+
+ return ret;
+}
+
+int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
+{
+ int ret;
+ struct perf_ftrace ftrace = {
+ .target = { .uid = UINT_MAX, },
+ };
+ const char * const ftrace_usage[] = {
+ "perf ftrace {live} [<options>] [<command>]",
+ "perf ftrace {live} [<options>] -- <command> [<options>]",
+ NULL
+ };
+ const struct option ftrace_options[] = {
+ OPT_END()
+ };
+
+ argc = parse_options(argc, argv, ftrace_options, ftrace_usage,
+ PARSE_OPT_STOP_AT_NON_OPTION);
+ if (!argc)
+ usage_with_options(ftrace_usage, ftrace_options);
+
+ if (geteuid() != 0) {
+ pr_err("ftrace only works for root!\n");
+ return -1;
+ }
+
+ if (strcmp(argv[0], "live") == 0) {
+ ret = __cmd_ftrace_live(&ftrace, argc, argv);
+ } else {
+ usage_with_options(ftrace_usage, ftrace_options);
+ }

return ret;
}
--
1.7.11.7

2013-04-23 08:35:36

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 02/14] perf util: Use evsel->name to get tracepoint_paths

From: Namhyung Kim <[email protected]>

Most tracepoint events already have their system and event name in
->name field so that searching whole event tracing directory for each
evsel to match given id is suboptimal.

Cc: Jiri Olsa <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/trace-event-info.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

diff --git a/tools/perf/util/trace-event-info.c b/tools/perf/util/trace-event-info.c
index ab18bf12d54a..dcfc1869c9af 100644
--- a/tools/perf/util/trace-event-info.c
+++ b/tools/perf/util/trace-event-info.c
@@ -414,12 +414,31 @@ get_tracepoints_path(struct list_head *pattrs)
if (pos->attr.type != PERF_TYPE_TRACEPOINT)
continue;
++nr_tracepoints;
+
+ if (pos->name && strchr(pos->name, ':')) {
+ char *str = strchr(pos->name, ':');
+
+ ppath->next = zalloc(sizeof(path));
+ if (!ppath->next)
+ goto error;
+
+ ppath->next->system = strndup(pos->name, str - pos->name);
+ ppath->next->name = strdup(str+1);
+
+ if (!ppath->next->system || !ppath->next->name)
+ goto error;
+
+ goto next;
+ }
+
ppath->next = tracepoint_id_to_path(pos->attr.config);
if (!ppath->next) {
+error:
pr_debug("No memory to alloc tracepoints list\n");
put_tracepoints_path(&path);
return NULL;
}
+next:
ppath = ppath->next;
}

--
1.7.11.7

2013-04-23 08:35:33

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 04/14] perf util: Add more debug message on failure path

From: Namhyung Kim <[email protected]>

It's helpful for debugging on tracing features.

Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/header.c | 4 ++-
tools/perf/util/trace-event-read.c | 53 ++++++++++++++++++++++++++------------
2 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 326068a593a5..31bd962801a1 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -2756,8 +2756,10 @@ static int perf_evsel__prepare_tracepoint_event(struct perf_evsel *evsel,
}

event = pevent_find_event(pevent, evsel->attr.config);
- if (event == NULL)
+ if (event == NULL) {
+ pr_debug("cannot find event format for %d\n", (int)evsel->attr.config);
return -1;
+ }

if (!evsel->name) {
snprintf(bf, sizeof(bf), "%s:%s", event->system, event->name);
diff --git a/tools/perf/util/trace-event-read.c b/tools/perf/util/trace-event-read.c
index e084e5e654b6..0e3b3f527320 100644
--- a/tools/perf/util/trace-event-read.c
+++ b/tools/perf/util/trace-event-read.c
@@ -262,39 +262,53 @@ static int read_header_files(struct pevent *pevent)

static int read_ftrace_file(struct pevent *pevent, unsigned long long size)
{
+ int ret;
char *buf;

buf = malloc(size);
- if (buf == NULL)
+ if (buf == NULL) {
+ pr_debug("memory allocation failure\n");
return -1;
+ }

- if (do_read(buf, size) < 0) {
- free(buf);
- return -1;
+ ret = do_read(buf, size);
+ if (ret < 0) {
+ pr_debug("error reading ftrace file.\n");
+ goto out;
}

- parse_ftrace_file(pevent, buf, size);
+ ret = parse_ftrace_file(pevent, buf, size);
+ if (ret < 0)
+ pr_debug("error parsing ftrace file.\n");
+out:
free(buf);
- return 0;
+ return ret;
}

static int read_event_file(struct pevent *pevent, char *sys,
unsigned long long size)
{
+ int ret;
char *buf;

buf = malloc(size);
- if (buf == NULL)
+ if (buf == NULL) {
+ pr_debug("memory allocation failure\n");
return -1;
+ }

- if (do_read(buf, size) < 0) {
+ ret = do_read(buf, size);
+ if (ret < 0) {
free(buf);
- return -1;
+ goto out;
}

- parse_event_file(pevent, buf, size, sys);
+ ret = parse_event_file(pevent, buf, size, sys);
+ if (ret < 0)
+ pr_debug("error parsing event file.\n");
+out:
free(buf);
- return 0;
+ return ret;
}

static int read_ftrace_files(struct pevent *pevent)
@@ -347,6 +361,7 @@ static int read_saved_cmdline(struct pevent *pevent)
{
unsigned long long size;
char *buf;
+ int ret;

/* it can have 0 size */
size = read8(pevent);
@@ -354,18 +369,22 @@ static int read_saved_cmdline(struct pevent *pevent)
return 0;

buf = malloc(size + 1);
- if (buf == NULL)
+ if (buf == NULL) {
+ pr_debug("memory allocation failure\n");
return -1;
+ }

- if (do_read(buf, size) < 0) {
- free(buf);
- return -1;
+ ret = do_read(buf, size);
+ if (ret < 0) {
+ pr_debug("error reading saved cmdlines\n");
+ goto out;
}

parse_saved_cmdline(pevent, buf, size);
-
+ ret = 0;
+out:
free(buf);
- return 0;
+ return ret;
}

ssize_t trace_report(int fd, struct pevent **ppevent, bool __repipe)
--
1.7.11.7

2013-04-23 08:35:31

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 03/14] perf util: Save pid-cmdline mapping into tracing header

From: Namhyung Kim <[email protected]>

Current trace info data lacks the saved cmdline mapping which is
needed for pevent to find out the comm of a task. Add this and bump
up the version number so that perf can determine its presence when
reading.

This is mostly corresponding to trace.dat file version 6, but still
lacks 4 byte of number of cpus, and 10 bytes of type string - and I
think we don't need those anyway.

Cc: Steven Rostedt <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/trace-event-info.c | 33 ++++++++++++++++++++++++++++++++-
tools/perf/util/trace-event-parse.c | 17 +++++++++++++++++
tools/perf/util/trace-event-read.c | 36 ++++++++++++++++++++++++++++++++++--
tools/perf/util/trace-event.h | 1 +
4 files changed, 84 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/trace-event-info.c b/tools/perf/util/trace-event-info.c
index dcfc1869c9af..5f6cba50efaf 100644
--- a/tools/perf/util/trace-event-info.c
+++ b/tools/perf/util/trace-event-info.c
@@ -41,7 +41,7 @@
#include <lk/debugfs.h>
#include "evsel.h"

-#define VERSION "0.5"
+#define VERSION "0.6"

static int output_fd;

@@ -390,6 +390,34 @@ out:
return err;
}

+static int record_saved_cmdline(void)
+{
+ unsigned int size;
+ char *path;
+ struct stat st;
+ int ret, err = 0;
+
+ path = get_tracing_file("saved_cmdlines");
+ if (!path) {
+ pr_debug("can't get tracing/saved_cmdline");
+ return -ENOMEM;
+ }
+
+ ret = stat(path, &st);
+ if (ret < 0) {
+ /* not found */
+ size = 0;
+ if (write(output_fd, &size, 8) != 8)
+ err = -EIO;
+ goto out;
+ }
+ err = record_file(path, 8);
+
+out:
+ put_tracing_file(path);
+ return err;
+}
+
static void
put_tracepoints_path(struct tracepoint_path *tps)
{
@@ -554,6 +582,9 @@ struct tracing_data *tracing_data_get(struct list_head *pattrs,
if (err)
goto out;
err = record_ftrace_printk();
+ if (err)
+ goto out;
+ err = record_saved_cmdline();

out:
/*
diff --git a/tools/perf/util/trace-event-parse.c b/tools/perf/util/trace-event-parse.c
index fe7a27d67d2b..ef09e4720e04 100644
--- a/tools/perf/util/trace-event-parse.c
+++ b/tools/perf/util/trace-event-parse.c
@@ -232,6 +232,23 @@ void parse_ftrace_printk(struct pevent *pevent,
}
}

+void parse_saved_cmdline(struct pevent *pevent,
+ char *file, unsigned int size __maybe_unused)
+{
+ char *comm;
+ char *line;
+ char *next = NULL;
+ int pid;
+
+ line = strtok_r(file, "\n", &next);
+ while (line) {
+ sscanf(line, "%d %ms", &pid, &comm);
+ pevent_register_comm(pevent, comm, pid);
+ free(comm);
+ line = strtok_r(NULL, "\n", &next);
+ }
+}
+
int parse_ftrace_file(struct pevent *pevent, char *buf, unsigned long size)
{
return pevent_parse_event(pevent, buf, size, "ftrace");
diff --git a/tools/perf/util/trace-event-read.c b/tools/perf/util/trace-event-read.c
index f2112270c663..e084e5e654b6 100644
--- a/tools/perf/util/trace-event-read.c
+++ b/tools/perf/util/trace-event-read.c
@@ -343,6 +343,31 @@ static int read_event_files(struct pevent *pevent)
return 0;
}

+static int read_saved_cmdline(struct pevent *pevent)
+{
+ unsigned long long size;
+ char *buf;
+
+ /* it can have 0 size */
+ size = read8(pevent);
+ if (!size)
+ return 0;
+
+ buf = malloc(size + 1);
+ if (buf == NULL)
+ return -1;
+
+ if (do_read(buf, size) < 0) {
+ free(buf);
+ return -1;
+ }
+
+ parse_saved_cmdline(pevent, buf, size);
+
+ free(buf);
+ return 0;
+}
+
ssize_t trace_report(int fd, struct pevent **ppevent, bool __repipe)
{
char buf[BUFSIZ];
@@ -383,10 +408,11 @@ ssize_t trace_report(int fd, struct pevent **ppevent, bool __repipe)
return -1;
if (show_version)
printf("version = %s\n", version);
- free(version);

- if (do_read(buf, 1) < 0)
+ if (do_read(buf, 1) < 0) {
+ free(version);
return -1;
+ }
file_bigendian = buf[0];
host_bigendian = bigendian();

@@ -422,6 +448,11 @@ ssize_t trace_report(int fd, struct pevent **ppevent, bool __repipe)
err = read_ftrace_printk(pevent);
if (err)
goto out;
+ if (!strcmp(version, "0.6")) {
+ err = read_saved_cmdline(pevent);
+ if (err)
+ goto out;
+ }

size = trace_data_size;
repipe = false;
@@ -438,5 +469,6 @@ ssize_t trace_report(int fd, struct pevent **ppevent, bool __repipe)
out:
if (pevent)
pevent_free(pevent);
+ free(version);
return size;
}
diff --git a/tools/perf/util/trace-event.h b/tools/perf/util/trace-event.h
index 929baae82f71..fbc8a19552b7 100644
--- a/tools/perf/util/trace-event.h
+++ b/tools/perf/util/trace-event.h
@@ -30,6 +30,7 @@ void *raw_field_ptr(struct event_format *event, const char *name, void *data);

void parse_proc_kallsyms(struct pevent *pevent, char *file, unsigned int size);
void parse_ftrace_printk(struct pevent *pevent, char *file, unsigned int size);
+void parse_saved_cmdline(struct pevent *pevent, char *file, unsigned int size);

ssize_t trace_report(int fd, struct pevent **pevent, bool repipe);

--
1.7.11.7

2013-04-23 13:07:27

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 02/14] perf util: Use evsel->name to get tracepoint_paths

On Tue, 2013-04-23 at 17:31 +0900, Namhyung Kim wrote:
> From: Namhyung Kim <[email protected]>
>
> Most tracepoint events already have their system and event name in
> ->name field so that searching whole event tracing directory for each
> evsel to match given id is suboptimal.
>
> Cc: Jiri Olsa <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Signed-off-by: Namhyung Kim <[email protected]>
> ---
> tools/perf/util/trace-event-info.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/tools/perf/util/trace-event-info.c b/tools/perf/util/trace-event-info.c
> index ab18bf12d54a..dcfc1869c9af 100644
> --- a/tools/perf/util/trace-event-info.c
> +++ b/tools/perf/util/trace-event-info.c
> @@ -414,12 +414,31 @@ get_tracepoints_path(struct list_head *pattrs)
> if (pos->attr.type != PERF_TYPE_TRACEPOINT)
> continue;
> ++nr_tracepoints;
> +
> + if (pos->name && strchr(pos->name, ':')) {
> + char *str = strchr(pos->name, ':');

Why not make the above into:

if (pos->name && (str = strchr(pos->name, ':'))) {

?

-- Steve


2013-04-23 15:53:57

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 14/14] perf tools: Add document for perf-ftrace command

On Tue, 2013-04-23 at 17:31 +0900, Namhyung Kim wrote:
> From: Namhyung Kim <[email protected]>
>
> Cc: Steven Rostedt <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Signed-off-by: Namhyung Kim <[email protected]>
> ---
> tools/perf/Documentation/perf-ftrace.txt | 114 +++++++++++++++++++++++++++++++
> 1 file changed, 114 insertions(+)
> create mode 100644 tools/perf/Documentation/perf-ftrace.txt
>
> diff --git a/tools/perf/Documentation/perf-ftrace.txt b/tools/perf/Documentation/perf-ftrace.txt
> new file mode 100644
> index 000000000000..f530cfabd8a5
> --- /dev/null
> +++ b/tools/perf/Documentation/perf-ftrace.txt
> @@ -0,0 +1,114 @@
> +perf-ftrace(1)
> +==============
> +
> +NAME
> +----
> +perf-ftrace - A front end for kernel's ftrace
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'perf ftrace' {live|record} [<options>] [<command>]
> +'perf ftrace' {show|report} [<options>]
> +
> +DESCRIPTION
> +-----------
> +This command reads the ftrace buffer and displays the trace recorded.
> +
> +There are several variants of perf ftrace:
> +
> + 'perf ftrace live <command>' to see a live trace of kernel functions
> + via trace_pipe during executing the <command>. If <command> is not
> + specified, one of target options (-p, -a or -C) should be given. It

one of *the* target options

> + just print out the result to stdout and doesn't save any files.

just *prints* out the result

> +
> + 'perf ftrace record <command>' to record trace entries of kernel
> + functions during the execution of the <command>. Like 'perf ftrace
> + live', at least one of <command> and/or target options should be
> + given in order to start recording. The recorded results are saved
> + under a directory ('perf.data.dir' by default) and will be used by
> + other perf-ftrace tools like 'show' and 'report'.
> +
> + 'perf ftrace show' to see the trace of recorded functions. It shows
> + functions sorted by time so the end result might be interspersed if
> + there's a concurrent execution.
> +
> + 'perf ftrace report' to display the result in an usual perf-report
> + style - entries are sorted by given sort keys and output is resorted
> + by its overhead.
> +
> +OPTIONS
> +-------
> +<command>...::
> + Any command you can specify in a shell.
> +
> +-t::
> +--tracer=::
> + The ftrace tracer to be used (default: function_graph).
> + Currently only 'function' and 'function_graph' are supported.
> + Used by 'live' and 'record' subcommands only.
> +
> +-p::
> +--pid=::
> + Record events on existing process ID (comma separated list).
> + Used by 'live' and 'record' subcommands only.
> +
> +-a::
> + Force system-wide collection. Scripts run without a <command>
> + normally use -a by default, while scripts run with a <command>
> + normally don't - this option allows the latter to be run in
> + system-wide mode. Used by 'live' and 'record' subcommands only.
> +
> +-C::
> +--cpu:: Only process samples for the list of CPUs provided.
> + Multiple CPUs can be provided as a comma-separated list with
> + no space: 0,1. Ranges of CPUs are specified with -: 0-2.
> + Default is to report samples on all online CPUs.
> +
> +-s::
> +--sort=::
> + Sort histogram entries by given key(s) - multiple keys can be
> + specified in CSV format. Following sort keys are available:
> + pid, comm, dso, symbol, cpu.
> +
> + Each key has following meaning:
> +
> + - comm: command (name) of the task
> + - pid: command and tid of the task
> + - dso: name of library or module executed at the time of sample
> + - symbol: name of function executed at the time of sample
> + - cpu: cpu number the task ran at the time of sample
> +
> + By default, comm, dso and symbol keys are used.
> + (i.e. --sort comm,dso,symbol)
> + Used by 'report' subcommands only.
> +
> +-i::
> +--input=::
> + Input directory name excluding '.dir' at the end.
> + (default: perf.data)
> +
> +-o::
> +--output=::
> + Output directory name excluding '.dir' at the end.
> + (default: perf.data)
> +
> +-v::
> +--verbose::
> + Be more verbose (show counter open errors, etc).
> +
> +-D::
> +--dump-raw-trace::
> + Dump raw trace in ASCII. Used by 'report' subcommands only.
> +
> +-I::
> +--show-info::
> + Display extended information about the record. This adds
> + information which may be very large and thus may clutter the
> + display. It currently includes: cpu and numa topology of the
> + host system. It can only be used with the 'report' subcommand.

Nice!

-- Steve

> +
> +SEE ALSO
> +--------
> +linkperf:perf-record[1], linkperf:perf-report[1],
> +linkperf:perf-script[1]

2013-04-23 15:58:05

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC 00/14] perf tools: Introduce new 'ftrace' command

On Tue, 2013-04-23 at 17:30 +0900, Namhyung Kim wrote:
> Hello,
>
> This patchset implements a front-end tool for kernel's ftrace. It
> uses function_graph tracer by default and normal function tracer is
> also supported. (Of course you need to enable those tracers in your
> kernel first.)
>
> NOTE: It's in very early stage, so may contain many rough edges.
>

Very nice Namhyung, thanks for doing this. I did a quick run through of
the patches and I have no complaints about them. I'm not sure how the
others will feel about it.

Also, have you given thought on how to execute both ftrace and the pmu
counters? That is, to get a way to interleave the data?

I added a 'perf' clock to ftrace. I didn't see that being checked,
although I didn't look too hard. If possible, you can use that, as well
as the multi-buffers that are coming in 3.10.

I would also like to add a system call to allow you to get to the ftrace
data without the need for debugfs. I'm not sure if we should piggy back
on the perf system call, or add a new one. I'm thinking we should add a
new one so that it doesn't get too confusing.

Hope others have comments,

-- Steve

2013-04-23 17:19:40

by Pekka Enberg

[permalink] [raw]
Subject: Re: [RFC 00/14] perf tools: Introduce new 'ftrace' command

Hello,

On Tue, 2013-04-23 at 17:30 +0900, Namhyung Kim wrote:
>> This patchset implements a front-end tool for kernel's ftrace. It
>> uses function_graph tracer by default and normal function tracer is
>> also supported. (Of course you need to enable those tracers in your
>> kernel first.)

On Tue, Apr 23, 2013 at 6:58 PM, Steven Rostedt <[email protected]> wrote:
> Very nice Namhyung, thanks for doing this. I did a quick run through of
> the patches and I have no complaints about them. I'm not sure how the
> others will feel about it.

I also skimmed through the patches and they look like a reasonable
starting point. So FWIW,

Acked-by: Pekka Enberg <[email protected]>

Are there any plans in trying to consolidate 'perf trace' and this? As
a user I don't really care what the underlying mechanism and would
like things to just work out of the box.

Pekka

2013-04-24 06:50:24

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC 00/14] perf tools: Introduce new 'ftrace' command


* Pekka Enberg <[email protected]> wrote:

> Hello,
>
> On Tue, 2013-04-23 at 17:30 +0900, Namhyung Kim wrote:
> >> This patchset implements a front-end tool for kernel's ftrace. It
> >> uses function_graph tracer by default and normal function tracer is
> >> also supported. (Of course you need to enable those tracers in your
> >> kernel first.)
>
> On Tue, Apr 23, 2013 at 6:58 PM, Steven Rostedt <[email protected]> wrote:
> > Very nice Namhyung, thanks for doing this. I did a quick run through of
> > the patches and I have no complaints about them. I'm not sure how the
> > others will feel about it.
>
> I also skimmed through the patches and they look like a reasonable
> starting point. So FWIW,
>
> Acked-by: Pekka Enberg <[email protected]>
>
> Are there any plans in trying to consolidate 'perf trace' and this? As
> a user I don't really care what the underlying mechanism and would
> like things to just work out of the box.

Exactly. I'd love to see ftrace and perf trace unified.

I have no objections against having a 'perf ftrace' interim step - as long
as it's really a temporary migration construct, not something we intend to
keep forever.

Thanks,

Ingo

2013-04-24 09:27:22

by Namhyung Kim

[permalink] [raw]
Subject: Re: [RFC 00/14] perf tools: Introduce new 'ftrace' command

Hi Steve,

On Tue, 23 Apr 2013 11:58:01 -0400, Steven Rostedt wrote:
> On Tue, 2013-04-23 at 17:30 +0900, Namhyung Kim wrote:
>> Hello,
>>
>> This patchset implements a front-end tool for kernel's ftrace. It
>> uses function_graph tracer by default and normal function tracer is
>> also supported. (Of course you need to enable those tracers in your
>> kernel first.)
>>
>> NOTE: It's in very early stage, so may contain many rough edges.
>>
>
> Very nice Namhyung, thanks for doing this. I did a quick run through of
> the patches and I have no complaints about them. I'm not sure how the
> others will feel about it.

Thanks!

>
> Also, have you given thought on how to execute both ftrace and the pmu
> counters? That is, to get a way to interleave the data?

I didn't think about it yet. I just thought enabling event tracing in
ftrace and get the data along with the function tracing. I'm not sure
how they interfere each other when enabled at the same time.

If that's not a valid concern, I think it's doable.


> I added a 'perf' clock to ftrace. I didn't see that being checked,
> although I didn't look too hard. If possible, you can use that, as well
> as the multi-buffers that are coming in 3.10.

Will look at it later.

>
> I would also like to add a system call to allow you to get to the ftrace
> data without the need for debugfs. I'm not sure if we should piggy back
> on the perf system call, or add a new one. I'm thinking we should add a
> new one so that it doesn't get too confusing.

Could you elaborate on it more? I cannot see how it'll look like..

>
> Hope others have comments,

Thanks for your kind review.
Namhyung

2013-04-24 09:30:02

by Namhyung Kim

[permalink] [raw]
Subject: Re: [RFC 00/14] perf tools: Introduce new 'ftrace' command

Hi Pekka,

On Tue, 23 Apr 2013 20:19:37 +0300, Pekka Enberg wrote:
> Hello,
>
> On Tue, 2013-04-23 at 17:30 +0900, Namhyung Kim wrote:
>>> This patchset implements a front-end tool for kernel's ftrace. It
>>> uses function_graph tracer by default and normal function tracer is
>>> also supported. (Of course you need to enable those tracers in your
>>> kernel first.)
>
> On Tue, Apr 23, 2013 at 6:58 PM, Steven Rostedt <[email protected]> wrote:
>> Very nice Namhyung, thanks for doing this. I did a quick run through of
>> the patches and I have no complaints about them. I'm not sure how the
>> others will feel about it.
>
> I also skimmed through the patches and they look like a reasonable
> starting point. So FWIW,
>
> Acked-by: Pekka Enberg <[email protected]>

Thank you!


> Are there any plans in trying to consolidate 'perf trace' and this? As
> a user I don't really care what the underlying mechanism and would
> like things to just work out of the box.

No plans currently. I'll try to think about it more..

Thanks,
Namhyung

2013-04-24 09:52:25

by Namhyung Kim

[permalink] [raw]
Subject: Re: [RFC 00/14] perf tools: Introduce new 'ftrace' command

Hi Ingo,

On Wed, 24 Apr 2013 08:50:18 +0200, Ingo Molnar wrote:
> * Pekka Enberg <[email protected]> wrote:
>
>> Hello,
>>
>> On Tue, 2013-04-23 at 17:30 +0900, Namhyung Kim wrote:
>> >> This patchset implements a front-end tool for kernel's ftrace. It
>> >> uses function_graph tracer by default and normal function tracer is
>> >> also supported. (Of course you need to enable those tracers in your
>> >> kernel first.)
>>
>> On Tue, Apr 23, 2013 at 6:58 PM, Steven Rostedt <[email protected]> wrote:
>> > Very nice Namhyung, thanks for doing this. I did a quick run through of
>> > the patches and I have no complaints about them. I'm not sure how the
>> > others will feel about it.
>>
>> I also skimmed through the patches and they look like a reasonable
>> starting point. So FWIW,
>>
>> Acked-by: Pekka Enberg <[email protected]>
>>
>> Are there any plans in trying to consolidate 'perf trace' and this? As
>> a user I don't really care what the underlying mechanism and would
>> like things to just work out of the box.
>
> Exactly. I'd love to see ftrace and perf trace unified.
>
> I have no objections against having a 'perf ftrace' interim step - as long
> as it's really a temporary migration construct, not something we intend to
> keep forever.

Hmm.. okay.

I think it'd be great if perf trace can cover both of kernel and user
spaces. But function tracing in userspace looks impossible IMHO.

So how about changing perf trace to receive proposed sub-commands
(live/record/show/report) and following options:

--kernel:: do ftrace (for kernel functions)
--syscall:: do system call tracing (what perf trace does now)


But I think we need to have an agreement on the file/directory format to
support multiple recorder threads first. Any comments on this issue?

Thanks,
Namhyung

2013-04-24 10:36:53

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 02/14] perf util: Use evsel->name to get tracepoint_paths

On Tue, 23 Apr 2013 09:07:20 -0400, Steven Rostedt wrote:
>> +
>> + if (pos->name && strchr(pos->name, ':')) {
>> + char *str = strchr(pos->name, ':');
>
> Why not make the above into:
>
> if (pos->name && (str = strchr(pos->name, ':'))) {
>
> ?

I wanted not to have an assignment in an if condition, just for my
preference. I will change it.

Thanks,
Namhyung

2013-04-24 10:39:40

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 14/14] perf tools: Add document for perf-ftrace command

On Tue, 23 Apr 2013 11:53:53 -0400, Steven Rostedt wrote:
> On Tue, 2013-04-23 at 17:31 +0900, Namhyung Kim wrote:
>> + 'perf ftrace live <command>' to see a live trace of kernel functions
>> + via trace_pipe during executing the <command>. If <command> is not
>> + specified, one of target options (-p, -a or -C) should be given. It
>
> one of *the* target options
>
>> + just print out the result to stdout and doesn't save any files.
>
> just *prints* out the result

Thank you for fixing the grammar! It's one of the hardest job for me to
write a document - even a short one. :)

Thanks,
Namhyung

2013-04-24 11:16:46

by zhangwei(Jovi)

[permalink] [raw]
Subject: Re: [RFC 00/14] perf tools: Introduce new 'ftrace' command

On 2013/4/24 17:27, Namhyung Kim wrote:
> Hi Steve,
>
> On Tue, 23 Apr 2013 11:58:01 -0400, Steven Rostedt wrote:
>> On Tue, 2013-04-23 at 17:30 +0900, Namhyung Kim wrote:
>>> Hello,
>>>
>>> This patchset implements a front-end tool for kernel's ftrace. It
>>> uses function_graph tracer by default and normal function tracer is
>>> also supported. (Of course you need to enable those tracers in your
>>> kernel first.)
>>>
>>> NOTE: It's in very early stage, so may contain many rough edges.
>>>
>>
>> Very nice Namhyung, thanks for doing this. I did a quick run through of
>> the patches and I have no complaints about them. I'm not sure how the
>> others will feel about it.
>
> Thanks!
>
>>
>> Also, have you given thought on how to execute both ftrace and the pmu
>> counters? That is, to get a way to interleave the data?
>
> I didn't think about it yet. I just thought enabling event tracing in
> ftrace and get the data along with the function tracing. I'm not sure
> how they interfere each other when enabled at the same time.
>
> If that's not a valid concern, I think it's doable.

There have ftrace:function event in tracing infrastructure, that event works in perf,
but I'm not sure if it could enhance to showing function graph as this patchset did.
perf record -e ftrace:function ls > /dev/null
perf report

>
>
>> I added a 'perf' clock to ftrace. I didn't see that being checked,
>> although I didn't look too hard. If possible, you can use that, as well
>> as the multi-buffers that are coming in 3.10.
>
> Will look at it later.
>
>>
>> I would also like to add a system call to allow you to get to the ftrace
>> data without the need for debugfs. I'm not sure if we should piggy back
>> on the perf system call, or add a new one. I'm thinking we should add a
>> new one so that it doesn't get too confusing.
>
> Could you elaborate on it more? I cannot see how it'll look like..
>
>>
>> Hope others have comments,
>
> Thanks for your kind review.
> Namhyung
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
> .
>

2013-04-24 11:17:16

by zhangwei(Jovi)

[permalink] [raw]
Subject: Re: [RFC 00/14] perf tools: Introduce new 'ftrace' command

On 2013/4/24 17:52, Namhyung Kim wrote:
> Hi Ingo,
>
> On Wed, 24 Apr 2013 08:50:18 +0200, Ingo Molnar wrote:
>> * Pekka Enberg <[email protected]> wrote:
>>
>>> Hello,
>>>
>>> On Tue, 2013-04-23 at 17:30 +0900, Namhyung Kim wrote:
>>>>> This patchset implements a front-end tool for kernel's ftrace. It
>>>>> uses function_graph tracer by default and normal function tracer is
>>>>> also supported. (Of course you need to enable those tracers in your
>>>>> kernel first.)
>>>
>>> On Tue, Apr 23, 2013 at 6:58 PM, Steven Rostedt <[email protected]> wrote:
>>>> Very nice Namhyung, thanks for doing this. I did a quick run through of
>>>> the patches and I have no complaints about them. I'm not sure how the
>>>> others will feel about it.
>>>
>>> I also skimmed through the patches and they look like a reasonable
>>> starting point. So FWIW,
>>>
>>> Acked-by: Pekka Enberg <[email protected]>
>>>
>>> Are there any plans in trying to consolidate 'perf trace' and this? As
>>> a user I don't really care what the underlying mechanism and would
>>> like things to just work out of the box.
>>
>> Exactly. I'd love to see ftrace and perf trace unified.
>>
>> I have no objections against having a 'perf ftrace' interim step - as long
>> as it's really a temporary migration construct, not something we intend to
>> keep forever.
>
> Hmm.. okay.
>
> I think it'd be great if perf trace can cover both of kernel and user
> spaces. But function tracing in userspace looks impossible IMHO.
>
> So how about changing perf trace to receive proposed sub-commands
> (live/record/show/report) and following options:
>
> --kernel:: do ftrace (for kernel functions)
> --syscall:: do system call tracing (what perf trace does now)
>
"--kernel" make me think it's will trace all event tracepoints, not only ftrace.

for "--syscall" option, I always think we should enhance syscall tracing more than now,
like use annotate info when parse syscall event in perf, as other tracepoint does.
we also need read filename user string in kernel space when open syscall tracepoint hit,
record in perf.data, then perf parse it. After these functionality is implemented,
we will don't need a separate syscall tracing tool in perf any more,
just use: perf record -e "syscalls:" -a
>
> But I think we need to have an agreement on the file/directory format to
> support multiple recorder threads first. Any comments on this issue?
>
> Thanks,
> Namhyung
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
> .
>

2013-04-24 12:33:15

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 01/14] perf util: Move debugfs/tracing helper functions to util.c

On Tue, Apr 23, 2013 at 05:30:59PM +0900, Namhyung Kim wrote:
> From: Namhyung Kim <[email protected]>
>
> Since they're generic helpers move them to util.c so that they can be
> used by others.

hum, we have sysfs object, maybe we want debugfs
as well to stay consistent.. even with interface:

for sysfs we have:
const char *sysfs_find_mountpoint(void);

for debugfs? :
const char *debugfs_find_mountpoint(void);

or we could unite all of them under fs object
as they are quite small ;-)

jirka

2013-04-24 12:43:19

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 02/14] perf util: Use evsel->name to get tracepoint_paths

On Tue, Apr 23, 2013 at 05:31:00PM +0900, Namhyung Kim wrote:
> From: Namhyung Kim <[email protected]>
>
> Most tracepoint events already have their system and event name in
> ->name field so that searching whole event tracing directory for each
> evsel to match given id is suboptimal.
>
> Cc: Jiri Olsa <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Signed-off-by: Namhyung Kim <[email protected]>
> ---
> tools/perf/util/trace-event-info.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/tools/perf/util/trace-event-info.c b/tools/perf/util/trace-event-info.c
> index ab18bf12d54a..dcfc1869c9af 100644
> --- a/tools/perf/util/trace-event-info.c
> +++ b/tools/perf/util/trace-event-info.c
> @@ -414,12 +414,31 @@ get_tracepoints_path(struct list_head *pattrs)
> if (pos->attr.type != PERF_TYPE_TRACEPOINT)
> continue;
> ++nr_tracepoints;
> +
> + if (pos->name && strchr(pos->name, ':')) {
> + char *str = strchr(pos->name, ':');
> +
> + ppath->next = zalloc(sizeof(path));
> + if (!ppath->next)
> + goto error;
> +
> + ppath->next->system = strndup(pos->name, str - pos->name);
> + ppath->next->name = strdup(str+1);
> +
> + if (!ppath->next->system || !ppath->next->name)
> + goto error;
> +
> + goto next;
> + }

good idea, could you please move this ^^^ into function ?
something like 'tracepoint_name_to_path'

jirka

> +
> ppath->next = tracepoint_id_to_path(pos->attr.config);
> if (!ppath->next) {
> +error:
> pr_debug("No memory to alloc tracepoints list\n");
> put_tracepoints_path(&path);
> return NULL;
> }
> +next:
> ppath = ppath->next;
> }
>
> --
> 1.7.11.7
>

2013-04-24 14:09:38

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 05/14] perf tools: Introduce new 'ftrace' tool

On Tue, Apr 23, 2013 at 05:31:03PM +0900, Namhyung Kim wrote:
> From: Namhyung Kim <[email protected]>
>

SNIP

> +out:
> + put_tracing_file(file);
> + return ret;
> +}
> +
> +static int reset_tracing_files(struct perf_ftrace *ftrace __maybe_unused)
> +{
> + if (write_tracing_file("tracing_on", "0") < 0)
> + return -1;
> +
> + if (write_tracing_file("current_tracer", "nop") < 0)
> + return -1;
> +
> + if (write_tracing_file("set_ftrace_pid", " ") < 0)
> + return -1;

I think we want to reset set_ftrace_pid file as well
probably to -1..

if I'm tracing command with this file having wrong CPU(s)
I dont get any data..

jirka

2013-04-24 14:13:04

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 09/14] perf ftrace: Add 'record' sub-command

On Tue, Apr 23, 2013 at 05:31:07PM +0900, Namhyung Kim wrote:
> From: Namhyung Kim <[email protected]>
>
> The ftrace record command is for saving raw ftrace buffer contents
> which can be get from per_cpu/cpuX/trace_pipe_raw.
>
> Since ftrace events are generated very frequently so single thread for
> recording mostly resulted in buffer overruns. Thus it uses per-cpu
> recorder thread to prevent such cases and they save the contents to
> their own files.
>
> These per-cpu data files are saved in a directory so that they can be
> easily found when needed. I chose the default directory name as
> "perf.data.dir" and the first two (i.e. "perf.data") can be changed
> with -o option. The structure of the directory looks like:
>
> $ tree perf.data.dir
> perf.data.dir/
> |-- perf.header
> |-- trace-cpu0.buf
> |-- trace-cpu1.buf
> |-- trace-cpu2.buf
> `-- trace-cpu3.buf
>
> In addition to trace-cpuX.buf files, it has perf.header file also.
> The perf.header file is compatible with existing perf.data format and
> contains usual event information, feature mask and sample data. The
> sample data is synthesized to indicate given cpu has a record file.

I got following lockup for record command:

# ./perf --no-pager ftrace record ls
...
hangs

in other terminal:

# pstack 14237
Thread 2 (Thread 0x7f3f1aa1d700 (LWP 14241)):
#0 0x0000003cec20b595 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000446ebe in record_ftrace_raw_buffer (arg=0x21f9ce0) at builtin-ftrace.c:451
#2 0x0000003cec207d14 in start_thread () from /lib64/libpthread.so.0
#3 0x0000003cebaf168d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f3f1ba209c0 (LWP 14237)):
#0 0x0000003cec208e60 in pthread_join () from /lib64/libpthread.so.0
#1 0x0000000000449c8c in do_ftrace_record (ftrace=ftrace@entry=0x7fff1377d240) at builtin-ftrace.c:682
#2 0x000000000044a57b in __cmd_ftrace_record (argv=0x7fff1377d818, argc=<optimized out>, ftrace=0x7fff1377d240) at builtin-ftrace.c:1535
#3 cmd_ftrace (argc=2, argv=0x7fff1377d818, prefix=<optimized out>) at builtin-ftrace.c:1655
#4 0x000000000041a763 in run_builtin (p=p@entry=0x7d58a0, argc=argc@entry=3, argv=argv@entry=0x7fff1377d818) at perf.c:320
#5 0x0000000000419faf in handle_internal_command (argv=0x7fff1377d818, argc=3) at perf.c:377
#6 run_argv (argv=0x7fff1377d600, argcp=0x7fff1377d60c) at perf.c:421
#7 main (argc=3, argv=0x7fff1377d818) at perf.c:522

jirka

2013-04-24 14:18:14

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 12/14] perf ftrace: Use pager for displaying result

On Tue, Apr 23, 2013 at 05:31:10PM +0900, Namhyung Kim wrote:
> From: Namhyung Kim <[email protected]>
>
> It's convenient to use pager when seeing many lines of result.
>
> Cc: Steven Rostedt <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Signed-off-by: Namhyung Kim <[email protected]>
> ---
> tools/perf/builtin-ftrace.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
> index 97293cee5597..f4efa1475cf9 100644
> --- a/tools/perf/builtin-ftrace.c
> +++ b/tools/perf/builtin-ftrace.c
> @@ -188,6 +188,7 @@ static int do_ftrace_live(struct perf_ftrace *ftrace)
> signal(SIGINT, sig_handler);
> signal(SIGUSR1, sig_handler);
> signal(SIGCHLD, sig_handler);
> + signal(SIGPIPE, sig_handler);
>
> if (setup_tracing_files(ftrace) < 0)
> goto out_reset;
> @@ -1461,6 +1462,8 @@ __cmd_ftrace_live(struct perf_ftrace *ftrace, int argc, const char **argv)
> argv, false, true) < 0)
> goto out_maps;
>
> + setup_pager();
> +
> ret = do_ftrace_live(ftrace);
>
> out_maps:
> @@ -1582,6 +1585,8 @@ __cmd_ftrace_show(struct perf_ftrace *ftrace, int argc, const char **argv)
> if (ftrace->dirname == NULL)
> ftrace->dirname = DEFAULT_DIRNAME;
>
> + setup_pager();
> +
> ret = do_ftrace_show(ftrace);
>
> perf_evlist__delete_maps(ftrace->evlist);
> @@ -1643,6 +1648,7 @@ __cmd_ftrace_report(struct perf_ftrace *ftrace, int argc, const char **argv)
> perf_hpp__column_enable(PERF_HPP__OVERHEAD);
> perf_hpp__init();
>
> + setup_pager();
> setup_sorting();
>
> symbol_conf.exclude_other = false;
> --
> 1.7.11.7
>

for some reason I'm able to get any output for 'live' command only
when using --no-pager option. I dont get any data for:

# ./perf ftrace live ls

not even ls output

But sometime I dont even get ftrace output for:

# ./perf --no-pager ftrace live ls

just ls output


I wonder there's some race with reading trace_pipe file,
I haven't checked deeply yet..

jirka

2013-04-25 05:46:08

by Namhyung Kim

[permalink] [raw]
Subject: Re: [RFC 00/14] perf tools: Introduce new 'ftrace' command

Hi Jovi,

On Wed, 24 Apr 2013 19:14:49 +0800, zhangwei wrote:
> On 2013/4/24 17:27, Namhyung Kim wrote:
>>> Also, have you given thought on how to execute both ftrace and the pmu
>>> counters? That is, to get a way to interleave the data?
>>
>> I didn't think about it yet. I just thought enabling event tracing in
>> ftrace and get the data along with the function tracing. I'm not sure
>> how they interfere each other when enabled at the same time.
>>
>> If that's not a valid concern, I think it's doable.
>
> There have ftrace:function event in tracing infrastructure, that event works in perf,
> but I'm not sure if it could enhance to showing function graph as this patchset did.
> perf record -e ftrace:function ls > /dev/null
> perf report

Yes, I know that.

When I asked to Steve that how I can use ftrace:funcgraph_{entry,exit}
as tracepoint events in perf tools, he said it'd be a non-trivial job
and need some amount of scary work. :)

Thanks,
Namhyung

2013-04-25 05:53:56

by Namhyung Kim

[permalink] [raw]
Subject: Re: [RFC 00/14] perf tools: Introduce new 'ftrace' command

On Wed, 24 Apr 2013 19:15:19 +0800, zhangwei wrote:
> On 2013/4/24 17:52, Namhyung Kim wrote:
>> I think it'd be great if perf trace can cover both of kernel and user
>> spaces. But function tracing in userspace looks impossible IMHO.
>>
>> So how about changing perf trace to receive proposed sub-commands
>> (live/record/show/report) and following options:
>>
>> --kernel:: do ftrace (for kernel functions)
>> --syscall:: do system call tracing (what perf trace does now)
>>
> "--kernel" make me think it's will trace all event tracepoints, not only ftrace.

Well, I'm not sure what's the right name. --kernel-function?

>
> for "--syscall" option, I always think we should enhance syscall tracing more than now,
> like use annotate info when parse syscall event in perf, as other tracepoint does.
> we also need read filename user string in kernel space when open syscall tracepoint hit,
> record in perf.data, then perf parse it. After these functionality is implemented,
> we will don't need a separate syscall tracing tool in perf any more,
> just use: perf record -e "syscalls:" -a

It's a different story. But I do think it'd better if there's a way to
see some of userspace strings like this.

Thanks,
Namhyung

2013-04-25 05:58:15

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 02/14] perf util: Use evsel->name to get tracepoint_paths

Hi Jiri,

On Wed, 24 Apr 2013 14:42:57 +0200, Jiri Olsa wrote:
> On Tue, Apr 23, 2013 at 05:31:00PM +0900, Namhyung Kim wrote:
>> From: Namhyung Kim <[email protected]>
>>
>> Most tracepoint events already have their system and event name in
>> ->name field so that searching whole event tracing directory for each
>> evsel to match given id is suboptimal.
>>
>> Cc: Jiri Olsa <[email protected]>
>> Cc: Frederic Weisbecker <[email protected]>
>> Signed-off-by: Namhyung Kim <[email protected]>
>> ---
>> tools/perf/util/trace-event-info.c | 19 +++++++++++++++++++
>> 1 file changed, 19 insertions(+)
>>
>> diff --git a/tools/perf/util/trace-event-info.c b/tools/perf/util/trace-event-info.c
>> index ab18bf12d54a..dcfc1869c9af 100644
>> --- a/tools/perf/util/trace-event-info.c
>> +++ b/tools/perf/util/trace-event-info.c
>> @@ -414,12 +414,31 @@ get_tracepoints_path(struct list_head *pattrs)
>> if (pos->attr.type != PERF_TYPE_TRACEPOINT)
>> continue;
>> ++nr_tracepoints;
>> +
>> + if (pos->name && strchr(pos->name, ':')) {
>> + char *str = strchr(pos->name, ':');
>> +
>> + ppath->next = zalloc(sizeof(path));
>> + if (!ppath->next)
>> + goto error;
>> +
>> + ppath->next->system = strndup(pos->name, str - pos->name);
>> + ppath->next->name = strdup(str+1);
>> +
>> + if (!ppath->next->system || !ppath->next->name)
>> + goto error;
>> +
>> + goto next;
>> + }
>
> good idea, could you please move this ^^^ into function ?
> something like 'tracepoint_name_to_path'

Will do!

Thanks,
Namhyung

2013-04-25 06:06:40

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 05/14] perf tools: Introduce new 'ftrace' tool

On Wed, 24 Apr 2013 16:09:18 +0200, Jiri Olsa wrote:
> On Tue, Apr 23, 2013 at 05:31:03PM +0900, Namhyung Kim wrote:
>> From: Namhyung Kim <[email protected]>
>> +out:
>> + put_tracing_file(file);
>> + return ret;
>> +}
>> +
>> +static int reset_tracing_files(struct perf_ftrace *ftrace __maybe_unused)
>> +{
>> + if (write_tracing_file("tracing_on", "0") < 0)
>> + return -1;
>> +
>> + if (write_tracing_file("current_tracer", "nop") < 0)
>> + return -1;
>> +
>> + if (write_tracing_file("set_ftrace_pid", " ") < 0)
>> + return -1;
>
> I think we want to reset set_ftrace_pid file as well
> probably to -1..
>
> if I'm tracing command with this file having wrong CPU(s)
> I dont get any data..

Not sure I understood what you mean exactly.

Do you mean it should be changed to

if (write_tracing_file("set_ftrace_pid", "-1") < 0)

? It seems it's the right thing to do for compatibility.

Thanks,
Namhyung

2013-04-25 06:24:22

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 09/14] perf ftrace: Add 'record' sub-command

On Wed, 24 Apr 2013 16:12:41 +0200, Jiri Olsa wrote:
> I got following lockup for record command:
>
> # ./perf --no-pager ftrace record ls
> ...
> hangs
>
> in other terminal:
>
> # pstack 14237
> Thread 2 (Thread 0x7f3f1aa1d700 (LWP 14241)):
> #0 0x0000003cec20b595 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1 0x0000000000446ebe in record_ftrace_raw_buffer (arg=0x21f9ce0) at builtin-ftrace.c:451
> #2 0x0000003cec207d14 in start_thread () from /lib64/libpthread.so.0
> #3 0x0000003cebaf168d in clone () from /lib64/libc.so.6
> Thread 1 (Thread 0x7f3f1ba209c0 (LWP 14237)):
> #0 0x0000003cec208e60 in pthread_join () from /lib64/libpthread.so.0
> #1 0x0000000000449c8c in do_ftrace_record (ftrace=ftrace@entry=0x7fff1377d240) at builtin-ftrace.c:682
> #2 0x000000000044a57b in __cmd_ftrace_record (argv=0x7fff1377d818, argc=<optimized out>, ftrace=0x7fff1377d240) at builtin-ftrace.c:1535
> #3 cmd_ftrace (argc=2, argv=0x7fff1377d818, prefix=<optimized out>) at builtin-ftrace.c:1655
> #4 0x000000000041a763 in run_builtin (p=p@entry=0x7d58a0, argc=argc@entry=3, argv=argv@entry=0x7fff1377d818) at perf.c:320
> #5 0x0000000000419faf in handle_internal_command (argv=0x7fff1377d818, argc=3) at perf.c:377
> #6 run_argv (argv=0x7fff1377d600, argcp=0x7fff1377d60c) at perf.c:421
> #7 main (argc=3, argv=0x7fff1377d818) at perf.c:522
>

How often can you reproduce it? I can't. :(

But it looks there's a race between cond_wait() and cond_broatcast().
I'll take a look at that.

Thanks for the test and the report,
Namhyung

2013-04-25 06:50:50

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 12/14] perf ftrace: Use pager for displaying result

On Wed, 24 Apr 2013 16:17:24 +0200, Jiri Olsa wrote:
> On Tue, Apr 23, 2013 at 05:31:10PM +0900, Namhyung Kim wrote:
>> From: Namhyung Kim <[email protected]>
>>
>> It's convenient to use pager when seeing many lines of result.
>>
> for some reason I'm able to get any output for 'live' command only
> when using --no-pager option. I dont get any data for:
>
> # ./perf ftrace live ls
>
> not even ls output
>
> But sometime I dont even get ftrace output for:
>
> # ./perf --no-pager ftrace live ls
>
> just ls output
>
>
> I wonder there's some race with reading trace_pipe file,
> I haven't checked deeply yet..

It seems go_pipe in perf_evlist__prepare_workload() was closed due to
the pager process, so the workload ended up blocking indefinitely.

I guess other subcommands were not affected by this since record doesn't
use a pager, and show and report don't have a workload.

Does the patch below fix it?



diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 95f9a7fa3e04..3f3a684193e0 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -1486,12 +1486,12 @@ __cmd_ftrace_live(struct perf_ftrace *ftrace, int argc, const char **argv)
if (!argc && perf_target__none(&ftrace->target))
usage_with_options(live_usage, live_options);

+ setup_pager();
+
ret = ftrace_setup(ftrace, argc, argv);
if (ret < 0)
return ret;

- setup_pager();
-
ret = do_ftrace_live(ftrace);

ftrace_teardown(ftrace);

2013-04-25 09:10:26

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 05/14] perf tools: Introduce new 'ftrace' tool

On Thu, Apr 25, 2013 at 03:06:37PM +0900, Namhyung Kim wrote:
> On Wed, 24 Apr 2013 16:09:18 +0200, Jiri Olsa wrote:
> > On Tue, Apr 23, 2013 at 05:31:03PM +0900, Namhyung Kim wrote:
> >> From: Namhyung Kim <[email protected]>
> >> +out:
> >> + put_tracing_file(file);
> >> + return ret;
> >> +}
> >> +
> >> +static int reset_tracing_files(struct perf_ftrace *ftrace __maybe_unused)
> >> +{
> >> + if (write_tracing_file("tracing_on", "0") < 0)
> >> + return -1;
> >> +
> >> + if (write_tracing_file("current_tracer", "nop") < 0)
> >> + return -1;
> >> +
> >> + if (write_tracing_file("set_ftrace_pid", " ") < 0)
> >> + return -1;
> >
> > I think we want to reset set_ftrace_pid file as well
> > probably to -1..
> >
> > if I'm tracing command with this file having wrong CPU(s)
> > I dont get any data..
>
> Not sure I understood what you mean exactly.

ugh, sry.. I wrote set_ftrace_pid when I actually meant tracing_cpumask ;-)

>
> Do you mean it should be changed to
>
> if (write_tracing_file("set_ftrace_pid", "-1") < 0)
>
> ? It seems it's the right thing to do for compatibility.


looks like it's enough only to open it with O_TRUNC,
so I guess whatever minimal goes write_tracing_file

---
kernel/trace/ftrace.c:

ftrace_pid_open(struct inode *inode, struct file *file)
{
int ret = 0;

if ((file->f_mode & FMODE_WRITE) &&
(file->f_flags & O_TRUNC))
ftrace_pid_reset();


jirka

2013-04-25 09:51:50

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 05/14] perf tools: Introduce new 'ftrace' tool

On Thu, 25 Apr 2013 11:09:56 +0200, Jiri Olsa wrote:
> On Thu, Apr 25, 2013 at 03:06:37PM +0900, Namhyung Kim wrote:
>> On Wed, 24 Apr 2013 16:09:18 +0200, Jiri Olsa wrote:
>> > On Tue, Apr 23, 2013 at 05:31:03PM +0900, Namhyung Kim wrote:
>> >> From: Namhyung Kim <[email protected]>
>> >> +out:
>> >> + put_tracing_file(file);
>> >> + return ret;
>> >> +}
>> >> +
>> >> +static int reset_tracing_files(struct perf_ftrace *ftrace __maybe_unused)
>> >> +{
>> >> + if (write_tracing_file("tracing_on", "0") < 0)
>> >> + return -1;
>> >> +
>> >> + if (write_tracing_file("current_tracer", "nop") < 0)
>> >> + return -1;
>> >> +
>> >> + if (write_tracing_file("set_ftrace_pid", " ") < 0)
>> >> + return -1;
>> >
>> > I think we want to reset set_ftrace_pid file as well
>> > probably to -1..
>> >
>> > if I'm tracing command with this file having wrong CPU(s)
>> > I dont get any data..
>>
>> Not sure I understood what you mean exactly.
>
> ugh, sry.. I wrote set_ftrace_pid when I actually meant tracing_cpumask ;-)

Okay. :)

>
>>
>> Do you mean it should be changed to
>>
>> if (write_tracing_file("set_ftrace_pid", "-1") < 0)
>>
>> ? It seems it's the right thing to do for compatibility.
>
>
> looks like it's enough only to open it with O_TRUNC,
> so I guess whatever minimal goes write_tracing_file

Yeah, but I can see below from trace-cmd code.

trace-cmd/trace-record.c:

/*
* Older kernels required "-1" to disable pid
*/
if (ret < 0 && !strlen(pid))
ret = write(fd, "-1", 2);


Could you clarify this, Steve?

Thanks,
Namhyung

2013-04-25 13:05:25

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 05/14] perf tools: Introduce new 'ftrace' tool

On Thu, 2013-04-25 at 18:51 +0900, Namhyung Kim wrote:
> >
> >>
> >> Do you mean it should be changed to
> >>
> >> if (write_tracing_file("set_ftrace_pid", "-1") < 0)
> >>
> >> ? It seems it's the right thing to do for compatibility.
> >
> >
> > looks like it's enough only to open it with O_TRUNC,
> > so I guess whatever minimal goes write_tracing_file

Correct.

>
> Yeah, but I can see below from trace-cmd code.
>
> trace-cmd/trace-record.c:
>
> /*
> * Older kernels required "-1" to disable pid
> */
> if (ret < 0 && !strlen(pid))
> ret = write(fd, "-1", 2);
>
>
> Could you clarify this, Steve?

As the comment states, older kernels (before 2.6.33) required a -1. As
trace-cmd is set up to work with at least 2.6.32, it requires the "-1"
write. It doesn't hurt to have it.

But for perf ftrace (or whatever), I don't think we need the requirement
to work with 2.6.32 kernels. Perf was added in 2.6.31, but do all
functionalities still work that far back?

-- Steve

2013-04-25 15:46:55

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 05/14] perf tools: Introduce new 'ftrace' tool

On Thu, Apr 25, 2013 at 09:05:21AM -0400, Steven Rostedt wrote:
> On Thu, 2013-04-25 at 18:51 +0900, Namhyung Kim wrote:
> > >
> > >>
> > >> Do you mean it should be changed to
> > >>
> > >> if (write_tracing_file("set_ftrace_pid", "-1") < 0)
> > >>
> > >> ? It seems it's the right thing to do for compatibility.
> > >
> > >
> > > looks like it's enough only to open it with O_TRUNC,
> > > so I guess whatever minimal goes write_tracing_file
>
> Correct.
>
> >
> > Yeah, but I can see below from trace-cmd code.
> >
> > trace-cmd/trace-record.c:
> >
> > /*
> > * Older kernels required "-1" to disable pid
> > */
> > if (ret < 0 && !strlen(pid))
> > ret = write(fd, "-1", 2);
> >
> >
> > Could you clarify this, Steve?
>
> As the comment states, older kernels (before 2.6.33) required a -1. As
> trace-cmd is set up to work with at least 2.6.32, it requires the "-1"
> write. It doesn't hurt to have it.
>
> But for perf ftrace (or whatever), I don't think we need the requirement
> to work with 2.6.32 kernels. Perf was added in 2.6.31, but do all
> functionalities still work that far back?

perf maintains backward compatibility, but not sure how far
back it goes.. but as long as it's just writing -1 in here
let's do that

jirka

2013-04-26 08:35:39

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 09/14] perf ftrace: Add 'record' sub-command

On Thu, Apr 25, 2013 at 03:24:18PM +0900, Namhyung Kim wrote:
> On Wed, 24 Apr 2013 16:12:41 +0200, Jiri Olsa wrote:
> > I got following lockup for record command:
> >
> > # ./perf --no-pager ftrace record ls
> > ...
> > hangs
> >
> > in other terminal:
> >
> > # pstack 14237
> > Thread 2 (Thread 0x7f3f1aa1d700 (LWP 14241)):
> > #0 0x0000003cec20b595 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> > #1 0x0000000000446ebe in record_ftrace_raw_buffer (arg=0x21f9ce0) at builtin-ftrace.c:451
> > #2 0x0000003cec207d14 in start_thread () from /lib64/libpthread.so.0
> > #3 0x0000003cebaf168d in clone () from /lib64/libc.so.6
> > Thread 1 (Thread 0x7f3f1ba209c0 (LWP 14237)):
> > #0 0x0000003cec208e60 in pthread_join () from /lib64/libpthread.so.0
> > #1 0x0000000000449c8c in do_ftrace_record (ftrace=ftrace@entry=0x7fff1377d240) at builtin-ftrace.c:682
> > #2 0x000000000044a57b in __cmd_ftrace_record (argv=0x7fff1377d818, argc=<optimized out>, ftrace=0x7fff1377d240) at builtin-ftrace.c:1535
> > #3 cmd_ftrace (argc=2, argv=0x7fff1377d818, prefix=<optimized out>) at builtin-ftrace.c:1655
> > #4 0x000000000041a763 in run_builtin (p=p@entry=0x7d58a0, argc=argc@entry=3, argv=argv@entry=0x7fff1377d818) at perf.c:320
> > #5 0x0000000000419faf in handle_internal_command (argv=0x7fff1377d818, argc=3) at perf.c:377
> > #6 run_argv (argv=0x7fff1377d600, argcp=0x7fff1377d60c) at perf.c:421
> > #7 main (argc=3, argv=0x7fff1377d818) at perf.c:522
> >
>
> How often can you reproduce it? I can't. :(

yesterday it was more often than today ;-) but I still get it
after several runs, this reproduces it for me:

# while [ 1 ]; do ./perf --no-pager ftrace record ls ; done

seems like there's no change in backtraces:

[root@krava tracing]# pstack 4139
Thread 2 (Thread 0x7f4aa37fe700 (LWP 4145)):
#0 0x0000003cec20b595 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000446ebe in record_ftrace_raw_buffer (arg=0xe10e90) at builtin-ftrace.c:451
#2 0x0000003cec207d14 in start_thread () from /lib64/libpthread.so.0
#3 0x0000003cebaf168d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f4aac29c9c0 (LWP 4139)):
#0 0x0000003cec208e60 in pthread_join () from /lib64/libpthread.so.0
#1 0x0000000000449c8c in do_ftrace_record (ftrace=ftrace@entry=0x7fffaf9bc080) at builtin-ftrace.c:682
#2 0x000000000044a57b in __cmd_ftrace_record (argv=0x7fffaf9bc658, argc=<optimized out>, ftrace=0x7fffaf9bc080) at builtin-ftrace.c:1535
#3 cmd_ftrace (argc=2, argv=0x7fffaf9bc658, prefix=<optimized out>) at builtin-ftrace.c:1655
#4 0x000000000041a763 in run_builtin (p=p@entry=0x7d58a0, argc=argc@entry=3, argv=argv@entry=0x7fffaf9bc658) at perf.c:320
#5 0x0000000000419faf in handle_internal_command (argv=0x7fffaf9bc658, argc=3) at perf.c:377
#6 run_argv (argv=0x7fffaf9bc440, argcp=0x7fffaf9bc44c) at perf.c:421
#7 main (argc=3, argv=0x7fffaf9bc658) at perf.c:522

---

#0 0x0000003cec20b595 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000446ebe in record_ftrace_raw_buffer (arg=0xee9ce0) at builtin-ftrace.c:451
#2 0x0000003cec207d14 in start_thread () from /lib64/libpthread.so.0
#3 0x0000003cebaf168d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f9b383139c0 (LWP 4397)):
#0 0x0000003cec208e60 in pthread_join () from /lib64/libpthread.so.0
#1 0x0000000000449c8c in do_ftrace_record (ftrace=ftrace@entry=0x7fffdea14e50) at builtin-ftrace.c:682
#2 0x000000000044a57b in __cmd_ftrace_record (argv=0x7fffdea15428, argc=<optimized out>, ftrace=0x7fffdea14e50) at builtin-ftrace.c:1535
#3 cmd_ftrace (argc=2, argv=0x7fffdea15428, prefix=<optimized out>) at builtin-ftrace.c:1655
#4 0x000000000041a763 in run_builtin (p=p@entry=0x7d58a0, argc=argc@entry=3, argv=argv@entry=0x7fffdea15428) at perf.c:320
#5 0x0000000000419faf in handle_internal_command (argv=0x7fffdea15428, argc=3) at perf.c:377
#6 run_argv (argv=0x7fffdea15210, argcp=0x7fffdea1521c) at perf.c:421
#7 main (argc=3, argv=0x7fffdea15428) at perf.c:522


jirka

2013-04-26 08:40:44

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 12/14] perf ftrace: Use pager for displaying result

On Thu, Apr 25, 2013 at 03:50:45PM +0900, Namhyung Kim wrote:
> On Wed, 24 Apr 2013 16:17:24 +0200, Jiri Olsa wrote:
> > On Tue, Apr 23, 2013 at 05:31:10PM +0900, Namhyung Kim wrote:
> >> From: Namhyung Kim <[email protected]>
> >>
> >> It's convenient to use pager when seeing many lines of result.
> >>
> > for some reason I'm able to get any output for 'live' command only
> > when using --no-pager option. I dont get any data for:
> >
> > # ./perf ftrace live ls
> >
> > not even ls output

ok, now with your patch I get allways ls output correctly in pager

> >
> > But sometime I dont even get ftrace output for:
> >
> > # ./perf --no-pager ftrace live ls
> >
> > just ls output

this one still persists

jirka

2013-04-26 13:35:01

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 05/14] perf tools: Introduce new 'ftrace' tool

On 4/25/13 7:05 AM, Steven Rostedt wrote:
> Perf was added in 2.6.31, but do all
> functionalities still work that far back?


perf was accepted into 2.6.28 - first commit is 0793a61d.

It does have limited functionality (H/W and S/W counters) until 2.6.31.

David

2013-04-26 13:44:24

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 09/14] perf ftrace: Add 'record' sub-command

On 4/25/13 12:24 AM, Namhyung Kim wrote:
> But it looks there's a race between cond_wait() and cond_broatcast().
> I'll take a look at that.

Why not use eventfd or a pipe for the signalling instead?

David

2013-04-26 13:51:00

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 05/14] perf tools: Introduce new 'ftrace' tool

On Fri, 2013-04-26 at 07:34 -0600, David Ahern wrote:
> On 4/25/13 7:05 AM, Steven Rostedt wrote:
> > Perf was added in 2.6.31, but do all
> > functionalities still work that far back?
>
>
> perf was accepted into 2.6.28 - first commit is 0793a61d.

That's kernel side.

>
> It does have limited functionality (H/W and S/W counters) until 2.6.31.

The first commit I found by doing "git log tools/perf" and going to the
end was:

864709302a80f26fa9da3be5b47304f0b8bae192

But I now see that was moving perf from documentation to tools, so that
was incorrect. But even going back farther than that, of commit
d24e473e5b2ca86d1288b9416227ccc603313d0f which adds the first
development of perf in the Documentation directory, I still get:

git describe --contains d24e473e5b2ca86d1288b9416227ccc603313d0f
v2.6.31-rc1~383^2~367

I still stand by the fact that perf was not added until 2.6.31.

-- Steve

2013-05-06 01:45:04

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 09/14] perf ftrace: Add 'record' sub-command

Hi Jiri,

Sorry for late reply. I've been busy these days.

On Thu, 25 Apr 2013 15:24:18 +0900, Namhyung Kim wrote:
> On Wed, 24 Apr 2013 16:12:41 +0200, Jiri Olsa wrote:
>> I got following lockup for record command:
>>
>> # ./perf --no-pager ftrace record ls
>> ...
>> hangs
>>
>> in other terminal:
>>
>> # pstack 14237
>> Thread 2 (Thread 0x7f3f1aa1d700 (LWP 14241)):
>> #0 0x0000003cec20b595 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
>> #1 0x0000000000446ebe in record_ftrace_raw_buffer (arg=0x21f9ce0) at builtin-ftrace.c:451
>> #2 0x0000003cec207d14 in start_thread () from /lib64/libpthread.so.0
>> #3 0x0000003cebaf168d in clone () from /lib64/libc.so.6
>> Thread 1 (Thread 0x7f3f1ba209c0 (LWP 14237)):
>> #0 0x0000003cec208e60 in pthread_join () from /lib64/libpthread.so.0
>> #1 0x0000000000449c8c in do_ftrace_record (ftrace=ftrace@entry=0x7fff1377d240) at builtin-ftrace.c:682
>> #2 0x000000000044a57b in __cmd_ftrace_record (argv=0x7fff1377d818, argc=<optimized out>, ftrace=0x7fff1377d240) at builtin-ftrace.c:1535
>> #3 cmd_ftrace (argc=2, argv=0x7fff1377d818, prefix=<optimized out>) at builtin-ftrace.c:1655
>> #4 0x000000000041a763 in run_builtin (p=p@entry=0x7d58a0, argc=argc@entry=3, argv=argv@entry=0x7fff1377d818) at perf.c:320
>> #5 0x0000000000419faf in handle_internal_command (argv=0x7fff1377d818, argc=3) at perf.c:377
>> #6 run_argv (argv=0x7fff1377d600, argcp=0x7fff1377d60c) at perf.c:421
>> #7 main (argc=3, argv=0x7fff1377d818) at perf.c:522
>>
>
> How often can you reproduce it? I can't. :(
>
> But it looks there's a race between cond_wait() and cond_broatcast().
> I'll take a look at that.

I think the problem is that a recorder can be blocked if it received a
broadcast signal before calling cond_wait(). So I modified the code to
place signaling ready_cond and waiting start_cond in the same cs.

Does the patch below fix your problem?

Thanks,
Namhyung


diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 4296948197de..f1ecfa207822 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -428,12 +428,6 @@ static void *record_ftrace_raw_buffer(void *arg)

fra->state = RECORD_STATE__READY;

- /* Now I'm ready */
- pthread_mutex_lock(&recorder_mutex);
- recorder_count++;
- pthread_cond_signal(&recorder_ready_cond);
- pthread_mutex_unlock(&recorder_mutex);
-
/*
* block all signals but SIGUSR2.
* It'll be used to unblock a recorder to finish.
@@ -447,7 +441,10 @@ static void *record_ftrace_raw_buffer(void *arg)
fcntl(trace_fd, F_SETFL, O_NONBLOCK);
pollfd.fd = trace_fd;

+ /* Now I'm ready */
pthread_mutex_lock(&recorder_mutex);
+ recorder_count++;
+ pthread_cond_signal(&recorder_ready_cond);
pthread_cond_wait(&recorder_start_cond, &recorder_mutex);
pthread_mutex_unlock(&recorder_mutex);

2013-05-06 01:52:50

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 12/14] perf ftrace: Use pager for displaying result

On Fri, 26 Apr 2013 10:40:23 +0200, Jiri Olsa wrote:
> On Thu, Apr 25, 2013 at 03:50:45PM +0900, Namhyung Kim wrote:
>> On Wed, 24 Apr 2013 16:17:24 +0200, Jiri Olsa wrote:
>> > On Tue, Apr 23, 2013 at 05:31:10PM +0900, Namhyung Kim wrote:
>> >> From: Namhyung Kim <[email protected]>
>> >>
>> >> It's convenient to use pager when seeing many lines of result.
>> >>
>> > for some reason I'm able to get any output for 'live' command only
>> > when using --no-pager option. I dont get any data for:
>> >
>> > # ./perf ftrace live ls
>> >
>> > not even ls output
>
> ok, now with your patch I get allways ls output correctly in pager

"ls output" only? What about the ftrace outputs?

>
>> >
>> > But sometime I dont even get ftrace output for:
>> >
>> > # ./perf --no-pager ftrace live ls
>> >
>> > just ls output
>
> this one still persists

Hmm.. still hard to reproduce.. :(

As it's --no-pager mode, so that it can be reproduced for you without
this patch, right?

Thanks,
Namhyung

2013-05-06 01:57:17

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 09/14] perf ftrace: Add 'record' sub-command

Hi David,

On Fri, 26 Apr 2013 07:44:18 -0600, David Ahern wrote:
> On 4/25/13 12:24 AM, Namhyung Kim wrote:
>> But it looks there's a race between cond_wait() and cond_broatcast().
>> I'll take a look at that.
>
> Why not use eventfd or a pipe for the signalling instead?

Thanks for your feedback!

I didn't have an idea of using eventfd for this case. Does it helpful
to prevent such a deadlock like this?

Thanks,
Namhyung

2013-05-06 03:22:37

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 09/14] perf ftrace: Add 'record' sub-command

On 5/5/13 7:57 PM, Namhyung Kim wrote:
> Hi David,
>
> On Fri, 26 Apr 2013 07:44:18 -0600, David Ahern wrote:
>> On 4/25/13 12:24 AM, Namhyung Kim wrote:
>>> But it looks there's a race between cond_wait() and cond_broatcast().
>>> I'll take a look at that.
>>
>> Why not use eventfd or a pipe for the signalling instead?
>
> Thanks for your feedback!
>
> I didn't have an idea of using eventfd for this case. Does it helpful
> to prevent such a deadlock like this?

I have used it to replace pthread_cond_wait /
pthread_cond_{signal,broadcast} -- lighter and faster. It was added in
2.6.27 I believe, so will be available for any system that supports perf
(record side). For people compiling perf against say RHEL5 (e.g.
analysis) you will need a check as to whether it exists.

David

2013-05-06 11:38:39

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 12/14] perf ftrace: Use pager for displaying result

On Mon, May 06, 2013 at 10:52:46AM +0900, Namhyung Kim wrote:
> On Fri, 26 Apr 2013 10:40:23 +0200, Jiri Olsa wrote:
> > On Thu, Apr 25, 2013 at 03:50:45PM +0900, Namhyung Kim wrote:
> >> On Wed, 24 Apr 2013 16:17:24 +0200, Jiri Olsa wrote:
> >> > On Tue, Apr 23, 2013 at 05:31:10PM +0900, Namhyung Kim wrote:
> >> >> From: Namhyung Kim <[email protected]>
> >> >>
> >> >> It's convenient to use pager when seeing many lines of result.
> >> >>
> >> > for some reason I'm able to get any output for 'live' command only
> >> > when using --no-pager option. I dont get any data for:
> >> >
> >> > # ./perf ftrace live ls
> >> >
> >> > not even ls output
> >
> > ok, now with your patch I get allways ls output correctly in pager
>
> "ls output" only? What about the ftrace outputs?
>
> >
> >> >
> >> > But sometime I dont even get ftrace output for:
> >> >
> >> > # ./perf --no-pager ftrace live ls
> >> >
> >> > just ls output
> >
> > this one still persists
>
> Hmm.. still hard to reproduce.. :(
>
> As it's --no-pager mode, so that it can be reproduced for you without
> this patch, right?

right, I still see the second issue (not getting ftrace output)
regardless of the patch being applied or not

jirka

2013-05-06 12:04:35

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 09/14] perf ftrace: Add 'record' sub-command

On Mon, May 06, 2013 at 10:44:56AM +0900, Namhyung Kim wrote:
> Hi Jiri,
>
> Sorry for late reply. I've been busy these days.
>
> On Thu, 25 Apr 2013 15:24:18 +0900, Namhyung Kim wrote:
> > On Wed, 24 Apr 2013 16:12:41 +0200, Jiri Olsa wrote:
> >> I got following lockup for record command:
> >>
> >> # ./perf --no-pager ftrace record ls
> >> ...
> >> hangs
> >>
> >> in other terminal:
> >>
> >> # pstack 14237
> >> Thread 2 (Thread 0x7f3f1aa1d700 (LWP 14241)):
> >> #0 0x0000003cec20b595 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> >> #1 0x0000000000446ebe in record_ftrace_raw_buffer (arg=0x21f9ce0) at builtin-ftrace.c:451
> >> #2 0x0000003cec207d14 in start_thread () from /lib64/libpthread.so.0
> >> #3 0x0000003cebaf168d in clone () from /lib64/libc.so.6
> >> Thread 1 (Thread 0x7f3f1ba209c0 (LWP 14237)):
> >> #0 0x0000003cec208e60 in pthread_join () from /lib64/libpthread.so.0
> >> #1 0x0000000000449c8c in do_ftrace_record (ftrace=ftrace@entry=0x7fff1377d240) at builtin-ftrace.c:682
> >> #2 0x000000000044a57b in __cmd_ftrace_record (argv=0x7fff1377d818, argc=<optimized out>, ftrace=0x7fff1377d240) at builtin-ftrace.c:1535
> >> #3 cmd_ftrace (argc=2, argv=0x7fff1377d818, prefix=<optimized out>) at builtin-ftrace.c:1655
> >> #4 0x000000000041a763 in run_builtin (p=p@entry=0x7d58a0, argc=argc@entry=3, argv=argv@entry=0x7fff1377d818) at perf.c:320
> >> #5 0x0000000000419faf in handle_internal_command (argv=0x7fff1377d818, argc=3) at perf.c:377
> >> #6 run_argv (argv=0x7fff1377d600, argcp=0x7fff1377d60c) at perf.c:421
> >> #7 main (argc=3, argv=0x7fff1377d818) at perf.c:522
> >>
> >
> > How often can you reproduce it? I can't. :(
> >
> > But it looks there's a race between cond_wait() and cond_broatcast().
> > I'll take a look at that.
>
> I think the problem is that a recorder can be blocked if it received a
> broadcast signal before calling cond_wait(). So I modified the code to
> place signaling ready_cond and waiting start_cond in the same cs.
>
> Does the patch below fix your problem?

yes.. without the patch I get lockup in like 10 minutes
with the patch the test is now ok for almost an hour..

jirka

2013-05-08 05:27:45

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 09/14] perf ftrace: Add 'record' sub-command

Hi Jiri,

On Mon, 6 May 2013 14:04:11 +0200, Jiri Olsa wrote:
> On Mon, May 06, 2013 at 10:44:56AM +0900, Namhyung Kim wrote:
>> Hi Jiri,
>>
>> Sorry for late reply. I've been busy these days.
>>
>> On Thu, 25 Apr 2013 15:24:18 +0900, Namhyung Kim wrote:
>> > But it looks there's a race between cond_wait() and cond_broatcast().
>> > I'll take a look at that.
>>
>> I think the problem is that a recorder can be blocked if it received a
>> broadcast signal before calling cond_wait(). So I modified the code to
>> place signaling ready_cond and waiting start_cond in the same cs.
>>
>> Does the patch below fix your problem?
>
> yes.. without the patch I get lockup in like 10 minutes
> with the patch the test is now ok for almost an hour..

Many thanks for your test!

Namhyung

2013-05-08 05:49:13

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 12/14] perf ftrace: Use pager for displaying result

On Mon, 6 May 2013 13:38:19 +0200, Jiri Olsa wrote:
> On Mon, May 06, 2013 at 10:52:46AM +0900, Namhyung Kim wrote:
>> On Fri, 26 Apr 2013 10:40:23 +0200, Jiri Olsa wrote:
>> > On Thu, Apr 25, 2013 at 03:50:45PM +0900, Namhyung Kim wrote:
>> >> On Wed, 24 Apr 2013 16:17:24 +0200, Jiri Olsa wrote:
>> >> > On Tue, Apr 23, 2013 at 05:31:10PM +0900, Namhyung Kim wrote:
>> >> >> From: Namhyung Kim <[email protected]>
>> >> >>
>> >> >> It's convenient to use pager when seeing many lines of result.
>> >> >>
>> >> > for some reason I'm able to get any output for 'live' command only
>> >> > when using --no-pager option. I dont get any data for:
>> >> >
>> >> > # ./perf ftrace live ls
>> >> >
>> >> > not even ls output
>> >
>> > ok, now with your patch I get allways ls output correctly in pager
>>
>> "ls output" only? What about the ftrace outputs?
>>
>> >
>> >> >
>> >> > But sometime I dont even get ftrace output for:
>> >> >
>> >> > # ./perf --no-pager ftrace live ls
>> >> >
>> >> > just ls output
>> >
>> > this one still persists
>>
>> Hmm.. still hard to reproduce.. :(
>>
>> As it's --no-pager mode, so that it can be reproduced for you without
>> this patch, right?
>
> right, I still see the second issue (not getting ftrace output)
> regardless of the patch being applied or not

Did you get the same result always? What's the value of tracing_cpumask?

Could you please apply this debug patch below and run it again?

Thank you very much for your help!
Namhyung


diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index f1ecfa207822..6b7aeae2dfa5 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -40,9 +40,11 @@ struct perf_ftrace {
};

static bool done;
+static int _signal;

static void sig_handler(int sig __maybe_unused)
{
+ _signal = sig;
done = true;
}

@@ -218,6 +220,20 @@ static int do_ftrace_live(struct perf_ftrace *ftrace)

perf_evlist__start_workload(ftrace->evlist);

+ BUG_ON(ftrace->evlist->threads->map[0] != ftrace->evlist->workload.pid);
+
+ trace_file = get_tracing_file("tracing_on");
+ if (trace_file) {
+ int fd = open(trace_file, O_RDONLY);
+
+ BUG_ON(fd < 0);
+ BUG_ON(read(fd, buf, sizeof(buf)) < 0);
+ BUG_ON(buf[0] != '1');
+
+ close(fd);
+ put_tracing_file(trace_file);
+ }
+
while (!done) {
if (poll(&pollfd, 1, -1) < 0)
break;
@@ -231,6 +247,7 @@ static int do_ftrace_live(struct perf_ftrace *ftrace)
}
}

+ psignal(_signal, "perf");
write_tracing_file("tracing_on", "0");

/* read remaining buffer contents */

2013-05-10 09:04:25

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 12/14] perf ftrace: Use pager for displaying result

On Wed, May 08, 2013 at 02:49:10PM +0900, Namhyung Kim wrote:

SNIP

>
>
> diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
> index f1ecfa207822..6b7aeae2dfa5 100644
> --- a/tools/perf/builtin-ftrace.c
> +++ b/tools/perf/builtin-ftrace.c
> @@ -40,9 +40,11 @@ struct perf_ftrace {
> };
>
> static bool done;
> +static int _signal;
>
> static void sig_handler(int sig __maybe_unused)
> {
> + _signal = sig;
> done = true;
> }
>
> @@ -218,6 +220,20 @@ static int do_ftrace_live(struct perf_ftrace *ftrace)
>
> perf_evlist__start_workload(ftrace->evlist);
>
> + BUG_ON(ftrace->evlist->threads->map[0] != ftrace->evlist->workload.pid);
> +
> + trace_file = get_tracing_file("tracing_on");
> + if (trace_file) {
> + int fd = open(trace_file, O_RDONLY);
> +
> + BUG_ON(fd < 0);
> + BUG_ON(read(fd, buf, sizeof(buf)) < 0);
> + BUG_ON(buf[0] != '1');
> +
> + close(fd);
> + put_tracing_file(trace_file);
> + }
> +
> while (!done) {
> if (poll(&pollfd, 1, -1) < 0)
> break;
> @@ -231,6 +247,7 @@ static int do_ftrace_live(struct perf_ftrace *ftrace)
> }
> }
>
> + psignal(_signal, "perf");
> write_tracing_file("tracing_on", "0");
>
> /* read remaining buffer contents */
>
>

yep, that did it.. seems I'm getting the ftrace output
all the time, with:

[jolsa@krava perf]$ sudo ./perf --no-pager ftrace live ls -l /usr/ 2>&1 | grep perf
perf: Child exited
[jolsa@krava perf]$ sudo ./perf --no-pager ftrace live ls -l /usr/ 2>&1 | grep perf
1) 0.030 us | perf: Child exited
[jolsa@krava perf]$ sudo ./perf --no-pager ftrace live ls -l /usr/ 2>&1 | grep perf
1) perf: Child exited
[jolsa@krava perf]$ sudo ./perf --no-pager ftrace live ls -l /usr/ 2>&1 | grep perf
2) perf: Child exited
[jolsa@krava perf]$ sudo ./perf --no-pager ftrace live ls -l /usr/ 2>&1 | grep perf
1) 4.549 us perf: Child exited
^[[A[jolsa@krava perf]$ sudo ./perf --no-pager ftrace live ls -l /usr/ 2>&1 | grep perf
1) 0.024 us | perf: Child exited


thanks,
jirka

2013-05-13 08:53:25

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 12/14] perf ftrace: Use pager for displaying result

Hi Jiri,

On Fri, 10 May 2013 11:04:05 +0200, Jiri Olsa wrote:
> On Wed, May 08, 2013 at 02:49:10PM +0900, Namhyung Kim wrote:
> yep, that did it.. seems I'm getting the ftrace output
> all the time, with:

Hmm.. this was just a debug patch. So can you get the ftrace output
without this patch too? I suspect there's an unknown problem regarding
stdio behind my back otherwise.

Thanks,
Namhyung

2013-05-13 13:42:51

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 12/14] perf ftrace: Use pager for displaying result

On Mon, May 13, 2013 at 05:53:22PM +0900, Namhyung Kim wrote:
> Hi Jiri,
>
> On Fri, 10 May 2013 11:04:05 +0200, Jiri Olsa wrote:
> > On Wed, May 08, 2013 at 02:49:10PM +0900, Namhyung Kim wrote:
> > yep, that did it.. seems I'm getting the ftrace output
> > all the time, with:
>
> Hmm.. this was just a debug patch. So can you get the ftrace output
> without this patch too? I suspect there's an unknown problem regarding
> stdio behind my back otherwise.

I guess, but I cannot reproduce now even with or without
the patch.. aarghh ;) strange

I'll retry whatever new respin you'll do..

jirka

2013-05-14 01:03:07

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 12/14] perf ftrace: Use pager for displaying result

On Mon, 13 May 2013 15:42:28 +0200, Jiri Olsa wrote:
> On Mon, May 13, 2013 at 05:53:22PM +0900, Namhyung Kim wrote:
>> Hi Jiri,
>>
>> On Fri, 10 May 2013 11:04:05 +0200, Jiri Olsa wrote:
>> > On Wed, May 08, 2013 at 02:49:10PM +0900, Namhyung Kim wrote:
>> > yep, that did it.. seems I'm getting the ftrace output
>> > all the time, with:
>>
>> Hmm.. this was just a debug patch. So can you get the ftrace output
>> without this patch too? I suspect there's an unknown problem regarding
>> stdio behind my back otherwise.
>
> I guess, but I cannot reproduce now even with or without
> the patch.. aarghh ;) strange
>
> I'll retry whatever new respin you'll do..

Okay, thanks. I'll respin it soon.

Thanks,
Namhyung