Hello,
This is what Brendan requested on the perf-users mailing list [1] to
support FlameGraphs [2] more efficiently. This patchset adds a few
more callchain options to adjust the output for it.
* changes in v4)
- add missing doc update
- cleanup/fix callchain value print code
- add Acked-by from Brendan and Jiri
* changes in v3)
- put the value before callchains
- fix compile error
At first, 'folded' output mode was added. The folded output puts the
value, a space and all calchain nodes separated by semicolons. Now it
only supports --stdio as other UI provides some way of folding and/or
expanding callchains dynamically.
The value is now can be one of 'percent', 'period', or 'count'. The
percent is current default output and the period is the raw number of
sample periods. The count is the number of samples for each callchain.
Here's an example:
$ perf report --no-children --show-nr-samples --stdio -g folded,count
...
39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
57 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
23 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;...
$ perf report --no-children --stdio -g percent
...
39.93% swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--28.63%-- start_secondary
|
--11.30%-- rest_init
$ perf report --no-children --stdio --show-total-period -g period
...
39.93% 13018705 swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--9334403-- start_secondary
|
--3684302-- rest_init
$ perf report --no-children --stdio --show-nr-samples -g count
...
39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--57-- start_secondary
|
--23-- rest_init
You can get it from 'perf/callchain-fold-v4' branch on my tree:
git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
Any comments are welcome, thanks
Namhyung
[1] http://www.spinics.net/lists/linux-perf-users/msg02498.html
[2] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
Namhyung Kim (4):
perf report: Support folded callchain mode on --stdio
perf callchain: Abstract callchain print function
perf callchain: Add count fields to struct callchain_node
perf report: Add callchain value option
tools/perf/Documentation/perf-report.txt | 13 +++--
tools/perf/builtin-report.c | 4 +-
tools/perf/ui/browsers/hists.c | 8 +--
tools/perf/ui/gtk/hists.c | 8 +--
tools/perf/ui/stdio/hist.c | 93 ++++++++++++++++++++++++++------
tools/perf/util/callchain.c | 87 +++++++++++++++++++++++++++++-
tools/perf/util/callchain.h | 24 ++++++++-
tools/perf/util/util.c | 3 +-
8 files changed, 205 insertions(+), 35 deletions(-)
--
2.6.2
Add new call chain option (-g) 'folded' to print callchains in a line.
The callchains are separated by semicolons, and preceded by (absolute)
percent values and a space.
For example, following 20 lines can be printed in 3 lines with the
folded output mode;
$ perf report -g flat --no-children | grep -v ^# | head -20
60.48% swapper [kernel.vmlinux] [k] intel_idle
54.60%
intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
start_secondary
5.88%
intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
$ perf report -g folded --no-children | grep -v ^# | head -3
60.48% swapper [kernel.vmlinux] [k] intel_idle
54.60% intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
5.88% intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
This mode is supported only for --stdio now and intended to be used by
some scripts like in FlameGraphs[1]. Support for other UI might be
added later.
[1] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
Requested-by: Brendan Gregg <[email protected]>
Acked-by: Brendan Gregg <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/Documentation/perf-report.txt | 1 +
tools/perf/ui/stdio/hist.c | 54 ++++++++++++++++++++++++++++++++
tools/perf/util/callchain.c | 6 ++++
tools/perf/util/callchain.h | 5 +--
4 files changed, 64 insertions(+), 2 deletions(-)
diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 5ce8da1e1256..f7d81aac9188 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -181,6 +181,7 @@ OPTIONS
- graph: use a graph tree, displaying absolute overhead rates. (default)
- fractal: like graph, but displays relative rates. Each branch of
the tree is considered as a new profiled object.
+ - folded: call chains are displayed in a line, separated by semicolons
- none: disable call chain display.
threshold is a percentage value which specifies a minimum percent to be
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index dfcbc90146ef..d04c068f4acf 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -260,6 +260,57 @@ static size_t callchain__fprintf_flat(FILE *fp, struct rb_root *tree,
return ret;
}
+static size_t __callchain__fprintf_folded(FILE *fp, struct callchain_node *node)
+{
+ struct callchain_list *chain;
+ size_t ret = 0;
+ char bf[1024];
+ bool first;
+
+ if (!node)
+ return 0;
+
+ ret += __callchain__fprintf_folded(fp, node->parent);
+
+ first = (ret == 0);
+ list_for_each_entry(chain, &node->val, list) {
+ if (chain->ip >= PERF_CONTEXT_MAX)
+ continue;
+ ret += fprintf(fp, "%s%s", first ? "" : ";",
+ callchain_list__sym_name(chain,
+ bf, sizeof(bf), false));
+ first = false;
+ }
+
+ return ret;
+}
+
+static size_t callchain__fprintf_folded(FILE *fp, struct rb_root *tree,
+ u64 total_samples)
+{
+ size_t ret = 0;
+ u32 entries_printed = 0;
+ struct callchain_node *chain;
+ struct rb_node *rb_node = rb_first(tree);
+
+ while (rb_node) {
+ double percent;
+
+ chain = rb_entry(rb_node, struct callchain_node, rb_node);
+ percent = chain->hit * 100.0 / total_samples;
+
+ ret += fprintf(fp, "%.2f%% ", percent);
+ ret += __callchain__fprintf_folded(fp, chain);
+ ret += fprintf(fp, "\n");
+ if (++entries_printed == callchain_param.print_limit)
+ break;
+
+ rb_node = rb_next(rb_node);
+ }
+
+ return ret;
+}
+
static size_t hist_entry_callchain__fprintf(struct hist_entry *he,
u64 total_samples, int left_margin,
FILE *fp)
@@ -278,6 +329,9 @@ static size_t hist_entry_callchain__fprintf(struct hist_entry *he,
case CHAIN_FLAT:
return callchain__fprintf_flat(fp, &he->sorted_chain, total_samples);
break;
+ case CHAIN_FOLDED:
+ return callchain__fprintf_folded(fp, &he->sorted_chain, total_samples);
+ break;
case CHAIN_NONE:
break;
default:
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 735ad48e1858..08cb220ba5ea 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -44,6 +44,10 @@ static int parse_callchain_mode(const char *value)
callchain_param.mode = CHAIN_GRAPH_REL;
return 0;
}
+ if (!strncmp(value, "folded", strlen(value))) {
+ callchain_param.mode = CHAIN_FOLDED;
+ return 0;
+ }
return -1;
}
@@ -218,6 +222,7 @@ rb_insert_callchain(struct rb_root *root, struct callchain_node *chain,
switch (mode) {
case CHAIN_FLAT:
+ case CHAIN_FOLDED:
if (rnode->hit < chain->hit)
p = &(*p)->rb_left;
else
@@ -338,6 +343,7 @@ int callchain_register_param(struct callchain_param *param)
param->sort = sort_chain_graph_rel;
break;
case CHAIN_FLAT:
+ case CHAIN_FOLDED:
param->sort = sort_chain_flat;
break;
case CHAIN_NONE:
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index fce8161e54db..544d99ac169c 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -24,7 +24,7 @@
#define CALLCHAIN_RECORD_HELP CALLCHAIN_HELP RECORD_MODE_HELP RECORD_SIZE_HELP
#define CALLCHAIN_REPORT_HELP \
- HELP_PAD "print_type:\tcall graph printing style (graph|flat|fractal|none)\n" \
+ HELP_PAD "print_type:\tcall graph printing style (graph|flat|fractal|folded|none)\n" \
HELP_PAD "threshold:\tminimum call graph inclusion threshold (<percent>)\n" \
HELP_PAD "print_limit:\tmaximum number of call graph entry (<number>)\n" \
HELP_PAD "order:\t\tcall graph order (caller|callee)\n" \
@@ -43,7 +43,8 @@ enum chain_mode {
CHAIN_NONE,
CHAIN_FLAT,
CHAIN_GRAPH_ABS,
- CHAIN_GRAPH_REL
+ CHAIN_GRAPH_REL,
+ CHAIN_FOLDED,
};
enum chain_order {
--
2.6.2
This is a preparation to support for printing other type of callchain
value like count or period.
Acked-by: Brendan Gregg <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/ui/browsers/hists.c | 8 +++++---
tools/perf/ui/gtk/hists.c | 8 ++------
tools/perf/ui/stdio/hist.c | 35 +++++++++++++++++------------------
tools/perf/util/callchain.c | 29 +++++++++++++++++++++++++++++
tools/perf/util/callchain.h | 4 ++++
5 files changed, 57 insertions(+), 27 deletions(-)
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index e5afb8936040..a8897aab4c4a 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -592,7 +592,6 @@ static int hist_browser__show_callchain(struct hist_browser *browser,
while (node) {
struct callchain_node *child = rb_entry(node, struct callchain_node, rb_node);
struct rb_node *next = rb_next(node);
- u64 cumul = callchain_cumul_hits(child);
struct callchain_list *chain;
char folded_sign = ' ';
int first = true;
@@ -619,9 +618,12 @@ static int hist_browser__show_callchain(struct hist_browser *browser,
browser->show_dso);
if (was_first && need_percent) {
- double percent = cumul * 100.0 / total;
+ char buf[64];
- if (asprintf(&alloc_str, "%2.2f%% %s", percent, str) < 0)
+ callchain_node__sprintf_value(child, buf, sizeof(buf),
+ total);
+
+ if (asprintf(&alloc_str, "%s %s", buf, str) < 0)
str = "Not enough memory!";
else
str = alloc_str;
diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index 4b3585eed1e8..d8037b7023e8 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -100,14 +100,10 @@ static void perf_gtk__add_callchain(struct rb_root *root, GtkTreeStore *store,
struct callchain_list *chain;
GtkTreeIter iter, new_parent;
bool need_new_parent;
- double percent;
- u64 hits, child_total;
+ u64 child_total;
node = rb_entry(nd, struct callchain_node, rb_node);
- hits = callchain_cumul_hits(node);
- percent = 100.0 * hits / total;
-
new_parent = *parent;
need_new_parent = !has_single_node && (node->val_nr > 1);
@@ -116,7 +112,7 @@ static void perf_gtk__add_callchain(struct rb_root *root, GtkTreeStore *store,
gtk_tree_store_append(store, &iter, &new_parent);
- scnprintf(buf, sizeof(buf), "%5.2f%%", percent);
+ callchain_node__sprintf_value(node, buf, sizeof(buf), total);
gtk_tree_store_set(store, &iter, 0, buf, -1);
callchain_list__sym_name(chain, buf, sizeof(buf), false);
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index d04c068f4acf..9b5a0d8a43dc 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -34,10 +34,10 @@ static size_t ipchain__fprintf_graph_line(FILE *fp, int depth, int depth_mask,
return ret;
}
-static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_list *chain,
+static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_node *node,
+ struct callchain_list *chain,
int depth, int depth_mask, int period,
- u64 total_samples, u64 hits,
- int left_margin)
+ u64 total_samples, int left_margin)
{
int i;
size_t ret = 0;
@@ -50,10 +50,9 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_list *chain,
else
ret += fprintf(fp, " ");
if (!period && i == depth - 1) {
- double percent;
-
- percent = hits * 100.0 / total_samples;
- ret += percent_color_fprintf(fp, "--%2.2f%%-- ", percent);
+ ret += fprintf(fp, "--");
+ ret += callchain_node__fprintf_value(node, fp, total_samples);
+ ret += fprintf(fp, "--");
} else
ret += fprintf(fp, "%s", " ");
}
@@ -120,10 +119,9 @@ static size_t __callchain__fprintf_graph(FILE *fp, struct rb_root *root,
left_margin);
i = 0;
list_for_each_entry(chain, &child->val, list) {
- ret += ipchain__fprintf_graph(fp, chain, depth,
+ ret += ipchain__fprintf_graph(fp, child, chain, depth,
new_depth_mask, i++,
total_samples,
- cumul,
left_margin);
}
@@ -143,14 +141,17 @@ static size_t __callchain__fprintf_graph(FILE *fp, struct rb_root *root,
if (callchain_param.mode == CHAIN_GRAPH_REL &&
remaining && remaining != total_samples) {
+ struct callchain_node rem_node = {
+ .hit = remaining,
+ };
if (!rem_sq_bracket)
return ret;
new_depth_mask &= ~(1 << (depth - 1));
- ret += ipchain__fprintf_graph(fp, &rem_hits, depth,
+ ret += ipchain__fprintf_graph(fp, &rem_node, &rem_hits, depth,
new_depth_mask, 0, total_samples,
- remaining, left_margin);
+ left_margin);
}
return ret;
@@ -243,12 +244,11 @@ static size_t callchain__fprintf_flat(FILE *fp, struct rb_root *tree,
struct rb_node *rb_node = rb_first(tree);
while (rb_node) {
- double percent;
-
chain = rb_entry(rb_node, struct callchain_node, rb_node);
- percent = chain->hit * 100.0 / total_samples;
- ret = percent_color_fprintf(fp, " %6.2f%%\n", percent);
+ ret += fprintf(fp, " ");
+ ret += callchain_node__fprintf_value(chain, fp, total_samples);
+ ret += fprintf(fp, "\n");
ret += __callchain__fprintf_flat(fp, chain, total_samples);
ret += fprintf(fp, "\n");
if (++entries_printed == callchain_param.print_limit)
@@ -294,12 +294,11 @@ static size_t callchain__fprintf_folded(FILE *fp, struct rb_root *tree,
struct rb_node *rb_node = rb_first(tree);
while (rb_node) {
- double percent;
chain = rb_entry(rb_node, struct callchain_node, rb_node);
- percent = chain->hit * 100.0 / total_samples;
- ret += fprintf(fp, "%.2f%% ", percent);
+ ret += callchain_node__fprintf_value(chain, fp, total_samples);
+ ret += fprintf(fp, " ");
ret += __callchain__fprintf_folded(fp, chain);
ret += fprintf(fp, "\n");
if (++entries_printed == callchain_param.print_limit)
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 08cb220ba5ea..e2ef9b38acb6 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -805,6 +805,35 @@ char *callchain_list__sym_name(struct callchain_list *cl,
return bf;
}
+char *callchain_node__sprintf_value(struct callchain_node *node,
+ char *bf, size_t bfsize, u64 total)
+{
+ double percent = 0.0;
+ u64 period = callchain_cumul_hits(node);
+
+ if (callchain_param.mode == CHAIN_FOLDED)
+ period = node->hit;
+ if (total)
+ percent = period * 100.0 / total;
+
+ scnprintf(bf, bfsize, "%.2f%%", percent);
+ return bf;
+}
+
+int callchain_node__fprintf_value(struct callchain_node *node,
+ FILE *fp, u64 total)
+{
+ double percent = 0.0;
+ u64 period = callchain_cumul_hits(node);
+
+ if (callchain_param.mode == CHAIN_FOLDED)
+ period = node->hit;
+ if (total)
+ percent = period * 100.0 / total;
+
+ return percent_color_fprintf(fp, "%.2f%%", percent);
+}
+
static void free_callchain_node(struct callchain_node *node)
{
struct callchain_list *list, *tmp;
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 544d99ac169c..f9e00e3d1243 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -230,6 +230,10 @@ static inline int arch_skip_callchain_idx(struct thread *thread __maybe_unused,
char *callchain_list__sym_name(struct callchain_list *cl,
char *bf, size_t bfsize, bool show_dso);
+char *callchain_node__sprintf_value(struct callchain_node *node,
+ char *bf, size_t bfsize, u64 total);
+int callchain_node__fprintf_value(struct callchain_node *node,
+ FILE *fp, u64 total);
void free_callchain(struct callchain_root *root);
--
2.6.2
It's to track the count of occurrences of the callchains.
Acked-by: Brendan Gregg <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/callchain.c | 10 ++++++++++
tools/perf/util/callchain.h | 7 +++++++
2 files changed, 17 insertions(+)
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index e2ef9b38acb6..60754de700d4 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -437,6 +437,8 @@ add_child(struct callchain_node *parent,
new->children_hit = 0;
new->hit = period;
+ new->children_count = 0;
+ new->count = 1;
return new;
}
@@ -484,6 +486,9 @@ split_add_child(struct callchain_node *parent,
parent->children_hit = callchain_cumul_hits(new);
new->val_nr = parent->val_nr - idx_local;
parent->val_nr = idx_local;
+ new->count = parent->count;
+ new->children_count = parent->children_count;
+ parent->children_count = callchain_cumul_counts(new);
/* create a new child for the new branch if any */
if (idx_total < cursor->nr) {
@@ -494,6 +499,8 @@ split_add_child(struct callchain_node *parent,
parent->hit = 0;
parent->children_hit += period;
+ parent->count = 0;
+ parent->children_count += 1;
node = callchain_cursor_current(cursor);
new = add_child(parent, cursor, period);
@@ -516,6 +523,7 @@ split_add_child(struct callchain_node *parent,
rb_insert_color(&new->rb_node_in, &parent->rb_root_in);
} else {
parent->hit = period;
+ parent->count = 1;
}
}
@@ -562,6 +570,7 @@ append_chain_children(struct callchain_node *root,
inc_children_hit:
root->children_hit += period;
+ root->children_count++;
}
static int
@@ -614,6 +623,7 @@ append_chain(struct callchain_node *root,
/* we match 100% of the path, increment the hit */
if (matches == root->val_nr && cursor->pos == cursor->nr) {
root->hit += period;
+ root->count++;
return 0;
}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index f9e00e3d1243..0e6cc83f1a46 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -60,6 +60,8 @@ struct callchain_node {
struct rb_root rb_root_in; /* input tree of children */
struct rb_root rb_root; /* sorted output tree of children */
unsigned int val_nr;
+ unsigned int count;
+ unsigned int children_count;
u64 hit;
u64 children_hit;
};
@@ -145,6 +147,11 @@ static inline u64 callchain_cumul_hits(struct callchain_node *node)
return node->hit + node->children_hit;
}
+static inline unsigned callchain_cumul_counts(struct callchain_node *node)
+{
+ return node->count + node->children_count;
+}
+
int callchain_register_param(struct callchain_param *param);
int callchain_append(struct callchain_root *root,
struct callchain_cursor *cursor,
--
2.6.2
Now -g/--call-graph option supports how to display callchain values.
Possible values are 'percent', 'period' and 'count'. The percent is
same as before and it's the default behavior. The period displays the
raw period value rather than the percentage. The count displays the
number of occurrences.
$ perf report --no-children --stdio -g percent
...
39.93% swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--28.63%-- start_secondary
|
--11.30%-- rest_init
$ perf report --no-children --show-total-period --stdio -g period
...
39.93% 13018705 swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--9334403-- start_secondary
|
--3684302-- rest_init
$ perf report --no-children --show-nr-samples --stdio -g count
...
39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--57-- start_secondary
|
--23-- rest_init
Acked-by: Brendan Gregg <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/Documentation/perf-report.txt | 13 ++++---
tools/perf/builtin-report.c | 4 +--
tools/perf/ui/stdio/hist.c | 10 +++++-
tools/perf/util/callchain.c | 62 +++++++++++++++++++++++++++-----
tools/perf/util/callchain.h | 10 +++++-
tools/perf/util/util.c | 3 +-
6 files changed, 84 insertions(+), 18 deletions(-)
diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index f7d81aac9188..dab99ed2b339 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -170,11 +170,11 @@ OPTIONS
Dump raw trace in ASCII.
-g::
---call-graph=<print_type,threshold[,print_limit],order,sort_key,branch>::
+--call-graph=<print_type,threshold[,print_limit],order,sort_key[,branch],value>::
Display call chains using type, min percent threshold, print limit,
- call order, sort key and branch. Note that ordering of parameters is not
- fixed so any parement can be given in an arbitraty order. One exception
- is the print_limit which should be preceded by threshold.
+ call order, sort key, optional branch and value. Note that ordering of
+ parameters is not fixed so any parement can be given in an arbitraty order.
+ One exception is the print_limit which should be preceded by threshold.
print_type can be either:
- flat: single column, linear exposure of call chains.
@@ -205,6 +205,11 @@ OPTIONS
- branch: include last branch information in callgraph when available.
Usually more convenient to use --branch-history for this.
+ value can be:
+ - percent: diplay overhead percent (default)
+ - period: display event period
+ - count: display event count
+
--children::
Accumulate callchain of children to parent entry so that then can
show up in the output. The output will have a new "Children" column
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 2853ad2bd435..3dd4bb4ded1a 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -625,7 +625,7 @@ parse_percent_limit(const struct option *opt, const char *str,
return 0;
}
-#define CALLCHAIN_DEFAULT_OPT "graph,0.5,caller,function"
+#define CALLCHAIN_DEFAULT_OPT "graph,0.5,caller,function,percent"
const char report_callchain_help[] = "Display call graph (stack chain/backtrace):\n\n"
CALLCHAIN_REPORT_HELP
@@ -708,7 +708,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
OPT_BOOLEAN('x', "exclude-other", &symbol_conf.exclude_other,
"Only display entries with parent-match"),
OPT_CALLBACK_DEFAULT('g', "call-graph", &report,
- "print_type,threshold[,print_limit],order,sort_key[,branch]",
+ "print_type,threshold[,print_limit],order,sort_key[,branch],value",
report_callchain_help, &report_parse_callchain_opt,
callchain_default_opt),
OPT_BOOLEAN(0, "children", &symbol_conf.cumulate_callchain,
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 9b5a0d8a43dc..c0580f5ea7ea 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -81,13 +81,14 @@ static size_t __callchain__fprintf_graph(FILE *fp, struct rb_root *root,
int depth_mask, int left_margin)
{
struct rb_node *node, *next;
- struct callchain_node *child;
+ struct callchain_node *child = NULL;
struct callchain_list *chain;
int new_depth_mask = depth_mask;
u64 remaining;
size_t ret = 0;
int i;
uint entries_printed = 0;
+ int cumul_count = 0;
remaining = total_samples;
@@ -99,6 +100,7 @@ static size_t __callchain__fprintf_graph(FILE *fp, struct rb_root *root,
child = rb_entry(node, struct callchain_node, rb_node);
cumul = callchain_cumul_hits(child);
remaining -= cumul;
+ cumul_count += callchain_cumul_counts(child);
/*
* The depth mask manages the output of pipes that show
@@ -148,6 +150,12 @@ static size_t __callchain__fprintf_graph(FILE *fp, struct rb_root *root,
if (!rem_sq_bracket)
return ret;
+ if (callchain_param.value == CCVAL_COUNT && child && child->parent) {
+ rem_node.count = child->parent->children_count - cumul_count;
+ if (rem_node.count <= 0)
+ return ret;
+ }
+
new_depth_mask &= ~(1 << (depth - 1));
ret += ipchain__fprintf_graph(fp, &rem_node, &rem_hits, depth,
new_depth_mask, 0, total_samples,
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 60754de700d4..f3f1b95b808e 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -83,6 +83,23 @@ static int parse_callchain_sort_key(const char *value)
return -1;
}
+static int parse_callchain_value(const char *value)
+{
+ if (!strncmp(value, "percent", strlen(value))) {
+ callchain_param.value = CCVAL_PERCENT;
+ return 0;
+ }
+ if (!strncmp(value, "period", strlen(value))) {
+ callchain_param.value = CCVAL_PERIOD;
+ return 0;
+ }
+ if (!strncmp(value, "count", strlen(value))) {
+ callchain_param.value = CCVAL_COUNT;
+ return 0;
+ }
+ return -1;
+}
+
static int
__parse_callchain_report_opt(const char *arg, bool allow_record_opt)
{
@@ -106,7 +123,8 @@ __parse_callchain_report_opt(const char *arg, bool allow_record_opt)
if (!parse_callchain_mode(tok) ||
!parse_callchain_order(tok) ||
- !parse_callchain_sort_key(tok)) {
+ !parse_callchain_sort_key(tok) ||
+ !parse_callchain_value(tok)) {
/* parsing ok - move on to the next */
try_stack_size = false;
goto next;
@@ -820,13 +838,27 @@ char *callchain_node__sprintf_value(struct callchain_node *node,
{
double percent = 0.0;
u64 period = callchain_cumul_hits(node);
+ unsigned count = callchain_cumul_counts(node);
- if (callchain_param.mode == CHAIN_FOLDED)
+ if (callchain_param.mode == CHAIN_FOLDED) {
period = node->hit;
- if (total)
- percent = period * 100.0 / total;
+ count = node->count;
+ }
- scnprintf(bf, bfsize, "%.2f%%", percent);
+ switch (callchain_param.value) {
+ case CCVAL_PERIOD:
+ scnprintf(bf, bfsize, "%"PRIu64, period);
+ break;
+ case CCVAL_COUNT:
+ scnprintf(bf, bfsize, "%u", count);
+ break;
+ case CCVAL_PERCENT:
+ default:
+ if (total)
+ percent = period * 100.0 / total;
+ scnprintf(bf, bfsize, "%.2f%%", percent);
+ break;
+ }
return bf;
}
@@ -835,13 +867,25 @@ int callchain_node__fprintf_value(struct callchain_node *node,
{
double percent = 0.0;
u64 period = callchain_cumul_hits(node);
+ unsigned count = callchain_cumul_counts(node);
- if (callchain_param.mode == CHAIN_FOLDED)
+ if (callchain_param.mode == CHAIN_FOLDED) {
period = node->hit;
- if (total)
- percent = period * 100.0 / total;
+ count = node->count;
+ }
- return percent_color_fprintf(fp, "%.2f%%", percent);
+ switch (callchain_param.value) {
+ case CCVAL_PERIOD:
+ return fprintf(fp, "%"PRIu64, period);
+ case CCVAL_COUNT:
+ return fprintf(fp, "%u", count);
+ case CCVAL_PERCENT:
+ default:
+ if (total)
+ percent = period * 100.0 / total;
+ return percent_color_fprintf(fp, "%.2f%%", percent);
+ }
+ return 0;
}
static void free_callchain_node(struct callchain_node *node)
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 0e6cc83f1a46..b14d760fc4e3 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -29,7 +29,8 @@
HELP_PAD "print_limit:\tmaximum number of call graph entry (<number>)\n" \
HELP_PAD "order:\t\tcall graph order (caller|callee)\n" \
HELP_PAD "sort_key:\tcall graph sort key (function|address)\n" \
- HELP_PAD "branch:\t\tinclude last branch info to call graph (branch)\n"
+ HELP_PAD "branch:\t\tinclude last branch info to call graph (branch)\n" \
+ HELP_PAD "value:\t\tcall graph value (percent|period|count)\n"
enum perf_call_graph_mode {
CALLCHAIN_NONE,
@@ -81,6 +82,12 @@ enum chain_key {
CCKEY_ADDRESS
};
+enum chain_value {
+ CCVAL_PERCENT,
+ CCVAL_PERIOD,
+ CCVAL_COUNT,
+};
+
struct callchain_param {
bool enabled;
enum perf_call_graph_mode record_mode;
@@ -93,6 +100,7 @@ struct callchain_param {
bool order_set;
enum chain_key key;
bool branch_callstack;
+ enum chain_value value;
};
extern struct callchain_param callchain_param;
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index cd12c25e4ea4..174912f87913 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -20,7 +20,8 @@ struct callchain_param callchain_param = {
.mode = CHAIN_GRAPH_ABS,
.min_percent = 0.5,
.order = ORDER_CALLEE,
- .key = CCKEY_FUNCTION
+ .key = CCKEY_FUNCTION,
+ .value = CCVAL_PERCENT,
};
/*
--
2.6.2
Em Tue, Nov 03, 2015 at 09:52:07PM +0900, Namhyung Kim escreveu:
> Hello,
>
> This is what Brendan requested on the perf-users mailing list [1] to
> support FlameGraphs [2] more efficiently. This patchset adds a few
> more callchain options to adjust the output for it.
>
> * changes in v4)
> - add missing doc update
> - cleanup/fix callchain value print code
> - add Acked-by from Brendan and Jiri
Do those Acked-by stand? Things changed, the values moved from the end
of the line to the start, etc.
You said you would consider having a --no-hists, but I see nothing about
it in this patchkit.
Some more comments below.
- Arnaldo
> * changes in v3)
> - put the value before callchains
> - fix compile error
>
>
> At first, 'folded' output mode was added. The folded output puts the
> value, a space and all calchain nodes separated by semicolons. Now it
> only supports --stdio as other UI provides some way of folding and/or
> expanding callchains dynamically.
>
> The value is now can be one of 'percent', 'period', or 'count'. The
> percent is current default output and the period is the raw number of
> sample periods. The count is the number of samples for each callchain.
>
> Here's an example:
>
> $ perf report --no-children --show-nr-samples --stdio -g folded,count
> ...
> 39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
> 57 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
> 23 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;...
>
>
> $ perf report --no-children --stdio -g percent
So, in this first one you show the percent in both
> ...
> 39.93% swapper [kernel.vmlinux] [k] intel_idel
> |
> ---intel_idle
> cpuidle_enter_state
> cpuidle_enter
> call_cpuidle
> cpu_startup_entry
> |
> |--28.63%-- start_secondary
> |
> --11.30%-- rest_init
>
>
> $ perf report --no-children --stdio --show-total-period -g period
> ...
then here you _add_ the period to the hist_entry line, but...
> 39.93% 13018705 swapper [kernel.vmlinux] [k] intel_idel
> |
> ---intel_idle
> cpuidle_enter_state
> cpuidle_enter
> call_cpuidle
> cpu_startup_entry
> |
_replace_ the percentage with the period in the callchains.
Can't we have the same effect in both? I.e. I would expect the 39.93% to
simply be replaced with that 13018705.
> |--9334403-- start_secondary
> |
> --3684302-- rest_init
>
>
> $ perf report --no-children --stdio --show-nr-samples -g count
> ...
> 39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
Ditto for count
> |
> ---intel_idle
> cpuidle_enter_state
> cpuidle_enter
> call_cpuidle
> cpu_startup_entry
> |
> |--57-- start_secondary
> |
> --23-- rest_init
>
>
> You can get it from 'perf/callchain-fold-v4' branch on my tree:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
>
> Any comments are welcome, thanks
> Namhyung
>
>
> [1] http://www.spinics.net/lists/linux-perf-users/msg02498.html
> [2] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
>
>
> Namhyung Kim (4):
> perf report: Support folded callchain mode on --stdio
> perf callchain: Abstract callchain print function
> perf callchain: Add count fields to struct callchain_node
> perf report: Add callchain value option
>
> tools/perf/Documentation/perf-report.txt | 13 +++--
> tools/perf/builtin-report.c | 4 +-
> tools/perf/ui/browsers/hists.c | 8 +--
> tools/perf/ui/gtk/hists.c | 8 +--
> tools/perf/ui/stdio/hist.c | 93 ++++++++++++++++++++++++++------
> tools/perf/util/callchain.c | 87 +++++++++++++++++++++++++++++-
> tools/perf/util/callchain.h | 24 ++++++++-
> tools/perf/util/util.c | 3 +-
> 8 files changed, 205 insertions(+), 35 deletions(-)
>
> --
> 2.6.2
On Tue, Nov 3, 2015 at 6:40 AM, Arnaldo Carvalho de Melo
<[email protected]> wrote:
> Em Tue, Nov 03, 2015 at 09:52:07PM +0900, Namhyung Kim escreveu:
>> Hello,
>>
>> This is what Brendan requested on the perf-users mailing list [1] to
>> support FlameGraphs [2] more efficiently. This patchset adds a few
>> more callchain options to adjust the output for it.
>>
>> * changes in v4)
>> - add missing doc update
>> - cleanup/fix callchain value print code
>> - add Acked-by from Brendan and Jiri
>
> Do those Acked-by stand? Things changed, the values moved from the end
> of the line to the start, etc.
>
[...]
I'd Ack this change as it's a useful addition. It doesn't quite
address the folded-only output, but it's a step in that direction. I
think having the value at the start of a line only makes sense for the
perf report output containing the hist summary lines, for consistency.
Here's how I'd shuffle the output of this patch (ignore word wrap
issues with this email):
# ./perf report --stdio -g folded,count,caller -F pid | \
awk '/^ / { n = $1 }
/^[0-9]/ { split(n,a,":"); print a[2] "-" a[1] ";" $2,$1 }'
swapper-0;cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
809
swapper-0;xen_start_kernel;x86_64_start_reservations;start_kernel;rest_init;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
135
dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;check_events;xen_hypercall_xen_version
63
dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf
54
dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;memset_erms
3
dd-30551;xen_irq_enable_direct_end;check_events;xen_hypercall_xen_version 3
So the output is folded stacks, prefixed by comm-PID. Shuffling the
summarized output is a lot better than doing a "perf script" dump and
re-processing call chains. (Note that since I'm using -F, I didn't
need --no-children; and with "-g count", I didn't need
--show-nr-samples.)
I notice the fields (-F) option already has this precedent:
- "comm": prints PID:comm
- "pid": prints PID
If these were added to -g, along with a no-hists, then the two types
of folded-only output could be generated using:
perf report --stdio -g folded,count,comm,no-hists,caller
perf report --stdio -g folded,count,pid,no-hists,caller
... although "no-hists" doesn't hit me as intuitive. How about "-F
none" to specify zero columns? ie:
perf report --stdio -g folded,count,comm,caller -F none
perf report --stdio -g folded,count,pid,caller -F none
Brendan
Hi Brendan,
On Tue, Nov 03, 2015 at 01:33:43PM -0800, Brendan Gregg wrote:
> On Tue, Nov 3, 2015 at 6:40 AM, Arnaldo Carvalho de Melo
> <[email protected]> wrote:
> > Em Tue, Nov 03, 2015 at 09:52:07PM +0900, Namhyung Kim escreveu:
> >> Hello,
> >>
> >> This is what Brendan requested on the perf-users mailing list [1] to
> >> support FlameGraphs [2] more efficiently. This patchset adds a few
> >> more callchain options to adjust the output for it.
> >>
> >> * changes in v4)
> >> - add missing doc update
> >> - cleanup/fix callchain value print code
> >> - add Acked-by from Brendan and Jiri
> >
> > Do those Acked-by stand? Things changed, the values moved from the end
> > of the line to the start, etc.
> >
> [...]
>
> I'd Ack this change as it's a useful addition. It doesn't quite
> address the folded-only output, but it's a step in that direction. I
> think having the value at the start of a line only makes sense for the
> perf report output containing the hist summary lines, for consistency.
Right, thanks!
>
> Here's how I'd shuffle the output of this patch (ignore word wrap
> issues with this email):
>
> # ./perf report --stdio -g folded,count,caller -F pid | \
> awk '/^ / { n = $1 }
> /^[0-9]/ { split(n,a,":"); print a[2] "-" a[1] ";" $2,$1 }'
> swapper-0;cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
> 809
> swapper-0;xen_start_kernel;x86_64_start_reservations;start_kernel;rest_init;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
> 135
> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;check_events;xen_hypercall_xen_version
> 63
> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf
> 54
> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;memset_erms
> 3
> dd-30551;xen_irq_enable_direct_end;check_events;xen_hypercall_xen_version 3
>
> So the output is folded stacks, prefixed by comm-PID. Shuffling the
> summarized output is a lot better than doing a "perf script" dump and
> re-processing call chains. (Note that since I'm using -F, I didn't
> need --no-children;
Nope. The '-F pid' doesn't affect --children. It doesn't show the
children overhead column but we still have hist entries for
(synthesized) children..
$ perf report --no-children | wc -l
998
$ perf report --no-children -F pid,dso,sym | wc -l
998
$ perf report --children | wc -l
3229
$ perf report --children -F pid,dso,sym | wc -l
3202
So I think you still need to use --no-children (or set report.children
config variable to false) for your script.
> and with "-g count", I didn't need --show-nr-samples.)
Yes, I used -n/--show-nr-samples just to check the number is correct.
>
> I notice the fields (-F) option already has this precedent:
>
> - "comm": prints PID:comm
> - "pid": prints PID
It's opposite: "comm" prints comm, "pid" prints PID:comm. :)
>
> If these were added to -g, along with a no-hists, then the two types
> of folded-only output could be generated using:
>
> perf report --stdio -g folded,count,comm,no-hists,caller
> perf report --stdio -g folded,count,pid,no-hists,caller
As I said, using fields like comm, pid requires to have same keys in
--sort option. So it's basically unreliable to use those specific
field names in the -g option IMHO. I suggested to use 'info' (yes, it
needs better name) to print all sort keys.
>
> ... although "no-hists" doesn't hit me as intuitive. How about "-F
> none" to specify zero columns? ie:
>
> perf report --stdio -g folded,count,comm,caller -F none
> perf report --stdio -g folded,count,pid,caller -F none
Ah, makes sense. So it'd look like
$ perf report --stdio -g folded,count,info -F none -s comm
$ perf report --stdio -g folded,count,info -F none -s pid
The output would be
809 swapper-0 cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
Thoughts?
Thanks,
Namhyung
On Tue, Nov 3, 2015 at 5:54 PM, Namhyung Kim <[email protected]> wrote:
> Hi Brendan,
>
> On Tue, Nov 03, 2015 at 01:33:43PM -0800, Brendan Gregg wrote:
>> On Tue, Nov 3, 2015 at 6:40 AM, Arnaldo Carvalho de Melo
>> <[email protected]> wrote:
>> > Em Tue, Nov 03, 2015 at 09:52:07PM +0900, Namhyung Kim escreveu:
>> >> Hello,
>> >>
>> >> This is what Brendan requested on the perf-users mailing list [1] to
>> >> support FlameGraphs [2] more efficiently. This patchset adds a few
>> >> more callchain options to adjust the output for it.
>> >>
>> >> * changes in v4)
>> >> - add missing doc update
>> >> - cleanup/fix callchain value print code
>> >> - add Acked-by from Brendan and Jiri
>> >
>> > Do those Acked-by stand? Things changed, the values moved from the end
>> > of the line to the start, etc.
>> >
>> [...]
>>
>> I'd Ack this change as it's a useful addition. It doesn't quite
>> address the folded-only output, but it's a step in that direction. I
>> think having the value at the start of a line only makes sense for the
>> perf report output containing the hist summary lines, for consistency.
>
> Right, thanks!
>
>
>>
>> Here's how I'd shuffle the output of this patch (ignore word wrap
>> issues with this email):
>>
>> # ./perf report --stdio -g folded,count,caller -F pid | \
>> awk '/^ / { n = $1 }
>> /^[0-9]/ { split(n,a,":"); print a[2] "-" a[1] ";" $2,$1 }'
>> swapper-0;cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
>> 809
>> swapper-0;xen_start_kernel;x86_64_start_reservations;start_kernel;rest_init;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
>> 135
>> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;check_events;xen_hypercall_xen_version
>> 63
>> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf
>> 54
>> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;memset_erms
>> 3
>> dd-30551;xen_irq_enable_direct_end;check_events;xen_hypercall_xen_version 3
>>
>> So the output is folded stacks, prefixed by comm-PID. Shuffling the
>> summarized output is a lot better than doing a "perf script" dump and
>> re-processing call chains. (Note that since I'm using -F, I didn't
>> need --no-children;
>
> Nope. The '-F pid' doesn't affect --children. It doesn't show the
> children overhead column but we still have hist entries for
> (synthesized) children..
>
> $ perf report --no-children | wc -l
> 998
>
> $ perf report --no-children -F pid,dso,sym | wc -l
> 998
>
> $ perf report --children | wc -l
> 3229
>
> $ perf report --children -F pid,dso,sym | wc -l
> 3202
>
> So I think you still need to use --no-children (or set report.children
> config variable to false) for your script.
Ok, good to know, thanks.
>
>
>> and with "-g count", I didn't need --show-nr-samples.)
>
> Yes, I used -n/--show-nr-samples just to check the number is correct.
>
>
>>
>> I notice the fields (-F) option already has this precedent:
>>
>> - "comm": prints PID:comm
>> - "pid": prints PID
>
> It's opposite: "comm" prints comm, "pid" prints PID:comm. :)
Ah, right, sorry, I'd typed those the wrong way around. :)
>
>
>>
>> If these were added to -g, along with a no-hists, then the two types
>> of folded-only output could be generated using:
>>
>> perf report --stdio -g folded,count,comm,no-hists,caller
>> perf report --stdio -g folded,count,pid,no-hists,caller
>
> As I said, using fields like comm, pid requires to have same keys in
> --sort option. So it's basically unreliable to use those specific
> field names in the -g option IMHO. I suggested to use 'info' (yes, it
> needs better name) to print all sort keys.
>
>
>>
>> ... although "no-hists" doesn't hit me as intuitive. How about "-F
>> none" to specify zero columns? ie:
>>
>> perf report --stdio -g folded,count,comm,caller -F none
>> perf report --stdio -g folded,count,pid,caller -F none
>
> Ah, makes sense. So it'd look like
>
> $ perf report --stdio -g folded,count,info -F none -s comm
> $ perf report --stdio -g folded,count,info -F none -s pid
>
> The output would be
>
> 809 swapper-0 cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
>
Thanks, looks almost right: a couple of minor changes:
1. If perf already has the precedent of "PID:comm", instead of my
"comm-PID", then maybe it should use "PID:comm" for perf consistency.
Doesn't make much difference to me.
2. The second space, delimiting "PID:comm" (or comm) and the stack...
I'm nervous about using space as a delimiter any more than once, since
it can also appear in comm (eg, "java main") and frames (eg,
"JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*,
Thread*)" -- that's direct from "perf script"!). I'd consider making
it a semicolon:
809 swapper-0;cpu_bringup_and_idle;cpu_startup_entry;...
So the output is "value key", and key is a semicolon delimited stack
with an optional comm or PID:comm frame at the start.
Brendan
Em Tue, Nov 03, 2015 at 10:02:32PM -0800, Brendan Gregg escreveu:
> On Tue, Nov 3, 2015 at 5:54 PM, Namhyung Kim <[email protected]> wrote:
> > Ah, makes sense. So it'd look like
> > $ perf report --stdio -g folded,count,info -F none -s comm
> > $ perf report --stdio -g folded,count,info -F none -s pid
> > The output would be
> > 809 swapper-0 cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
> Thanks, looks almost right: a couple of minor changes:
> 1. If perf already has the precedent of "PID:comm", instead of my
> "comm-PID", then maybe it should use "PID:comm" for perf consistency.
> Doesn't make much difference to me.
> 2. The second space, delimiting "PID:comm" (or comm) and the stack...
> I'm nervous about using space as a delimiter any more than once, since
> it can also appear in comm (eg, "java main") and frames (eg,
> "JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*,
> Thread*)" -- that's direct from "perf script"!). I'd consider making
> it a semicolon:
> 809 swapper-0;cpu_bringup_and_idle;cpu_startup_entry;...
> So the output is "value key", and key is a semicolon delimited stack
> with an optional comm or PID:comm frame at the start.
Agreed, but then, we can have some sort of default and also be able to,
using -F, specify what are the fields we want, and in which order, and I
liked your suggestion of being able to specify "-F none" and that mean
no hist line to be produced.
Likewise, the way that each callchain line should be formatted should be
programmable via the command line, via the -g option, no? Then script
writers could use it in a way that doesn't requires further processing,
as Brendan showed.
But yeah, the value is the semicolon delimited stack all the way to the
comm/PID:comm if there are more than one or if the user asks it to be
there via a -g keyword, all the other counts/info are just relative to
that, CSV or whatever other delimiter the user asks it to, and space is
not an option, as we know it can appear in the middle of a COMM:
[root@zoo ~]# perf report -s comm | grep '[a-zA-Z] [a-zA-Z]'
# To display the perf.data header info, please use
# --header/--header-only options.
# Total Lost Samples: 0
# Samples: 164K of event 'cycles:pp'
# Event count (approx.): 34422160859
0.11% DOM Worker
0.10% JS Helper
0.01% Qt bearer threa
0.00% Socket Thread
0.00% dconf worker
0.00% JS Watchdog
[root@zoo ~]#
- Arnaldo
Hi Arnaldo and Brendan,
On Wed, Nov 04, 2015 at 11:51:31AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Nov 03, 2015 at 10:02:32PM -0800, Brendan Gregg escreveu:
> > On Tue, Nov 3, 2015 at 5:54 PM, Namhyung Kim <[email protected]> wrote:
> > > Ah, makes sense. So it'd look like
>
> > > $ perf report --stdio -g folded,count,info -F none -s comm
> > > $ perf report --stdio -g folded,count,info -F none -s pid
>
> > > The output would be
>
> > > 809 swapper-0 cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
>
> > Thanks, looks almost right: a couple of minor changes:
>
> > 1. If perf already has the precedent of "PID:comm", instead of my
> > "comm-PID", then maybe it should use "PID:comm" for perf consistency.
> > Doesn't make much difference to me.
Right. Actually I'd like to write it that way.. ;-)
> > 2. The second space, delimiting "PID:comm" (or comm) and the stack...
> > I'm nervous about using space as a delimiter any more than once, since
> > it can also appear in comm (eg, "java main") and frames (eg,
> > "JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*,
> > Thread*)" -- that's direct from "perf script"!). I'd consider making
> > it a semicolon:
Fair enough.
>
> > 809 swapper-0;cpu_bringup_and_idle;cpu_startup_entry;...
>
> > So the output is "value key", and key is a semicolon delimited stack
> > with an optional comm or PID:comm frame at the start.
>
> Agreed, but then, we can have some sort of default and also be able to,
> using -F, specify what are the fields we want, and in which order, and I
> liked your suggestion of being able to specify "-F none" and that mean
> no hist line to be produced.
>
> Likewise, the way that each callchain line should be formatted should be
> programmable via the command line, via the -g option, no? Then script
> writers could use it in a way that doesn't requires further processing,
> as Brendan showed.
Right. So '-s <key1>[,<key2>,...] -g info' can control which info is
displayed along with the callchains.
$ perf report -s comm,dso -g folded,count,info -F none
809 swapper;[kernel.vmlinux];cpu_bringup_and_idle;cpu_startup_entry;...
Note that the info part (swapper;[kernel.vmlinux]) is also separated
by a semicolon. But I think it's ok since it's controlled by command
line, so script can know how many entries will be.
>
> But yeah, the value is the semicolon delimited stack all the way to the
> comm/PID:comm if there are more than one or if the user asks it to be
> there via a -g keyword, all the other counts/info are just relative to
> that, CSV or whatever other delimiter the user asks it to, and space is
> not an option, as we know it can appear in the middle of a COMM:
Yes, I think that we should use a given separator (using -t option)
instead of hard-coded semicolon. Although it'd be rare, it seems
possible to use semicolons in the comm name too.
Thanks,
Namhyung
>
> [root@zoo ~]# perf report -s comm | grep '[a-zA-Z] [a-zA-Z]'
> # To display the perf.data header info, please use
> # --header/--header-only options.
> # Total Lost Samples: 0
> # Samples: 164K of event 'cycles:pp'
> # Event count (approx.): 34422160859
> 0.11% DOM Worker
> 0.10% JS Helper
> 0.01% Qt bearer threa
> 0.00% Socket Thread
> 0.00% dconf worker
> 0.00% JS Watchdog
> [root@zoo ~]#
>
> - Arnaldo
Em Thu, Nov 05, 2015 at 12:34:57AM +0900, Namhyung Kim escreveu:
> Hi Arnaldo and Brendan,
>
> On Wed, Nov 04, 2015 at 11:51:31AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Nov 03, 2015 at 10:02:32PM -0800, Brendan Gregg escreveu:
> > > On Tue, Nov 3, 2015 at 5:54 PM, Namhyung Kim <[email protected]> wrote:
> > > > Ah, makes sense. So it'd look like
> >
> > > > $ perf report --stdio -g folded,count,info -F none -s comm
> > > > $ perf report --stdio -g folded,count,info -F none -s pid
> >
> > > > The output would be
> >
> > > > 809 swapper-0 cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
> >
> > > Thanks, looks almost right: a couple of minor changes:
> >
> > > 1. If perf already has the precedent of "PID:comm", instead of my
> > > "comm-PID", then maybe it should use "PID:comm" for perf consistency.
> > > Doesn't make much difference to me.
> Right. Actually I'd like to write it that way.. ;-)
Well, those are two pieces of information: "comm" and "pid", so it would
be nice that we could take this opportunity to remove it, i.e. just
treat it as any other field and separate it via the designated
separator, and only show the ones specified.
> > > 2. The second space, delimiting "PID:comm" (or comm) and the stack...
> > > I'm nervous about using space as a delimiter any more than once, since
> > > it can also appear in comm (eg, "java main") and frames (eg,
> > > "JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*,
> > > Thread*)" -- that's direct from "perf script"!). I'd consider making
> > > it a semicolon:
The C++ symbol names are the biggest challenge here for a single line in
CSV ("comma" quoted) record :-\
> Fair enough.
> > > 809 swapper-0;cpu_bringup_and_idle;cpu_startup_entry;...
> >
> > > So the output is "value key", and key is a semicolon delimited stack
> > > with an optional comm or PID:comm frame at the start.
> >
> > Agreed, but then, we can have some sort of default and also be able to,
> > using -F, specify what are the fields we want, and in which order, and I
> > liked your suggestion of being able to specify "-F none" and that mean
> > no hist line to be produced.
> >
> > Likewise, the way that each callchain line should be formatted should be
> > programmable via the command line, via the -g option, no? Then script
> > writers could use it in a way that doesn't requires further processing,
> > as Brendan showed.
>
> Right. So '-s <key1>[,<key2>,...] -g info' can control which info is
> displayed along with the callchains.
So you force the same selection of fields to be used for both the
hist_entry and the callchains?
And why is that some of the fields will be selected via -s (comm, dso)
and other fields will be selected via -g (count, this "info" thing)?
Why not be flexible and allow any set of fields to be used in both
cases, without one being tied to the other?
I.e. instead of:
-s <key1>[,<key2>,...] -g info
We use:
-s <key1>[,<key2>,...] -g [<keyA>[,<keyB>],...]
If one would want to have the same set for both, then yeah, a keyword
for that would be interesting, reusing your "info":
-s <key1>[,<key2>,...] -g info
Would mean:
-s <key1>[,<key2>,...] -g [<key1>[,<key2>],...]
With both ... equal
But "info" is way too vague, perhaps "hist_keys", or something more
compact, like: "\-s", to reuse the semantic of regular expression groups
(\1).
> $ perf report -s comm,dso -g folded,count,info -F none
> 809 swapper;[kernel.vmlinux];cpu_bringup_and_idle;cpu_startup_entry;...
> Note that the info part (swapper;[kernel.vmlinux]) is also separated
> by a semicolon. But I think it's ok since it's controlled by command
> line, so script can know how many entries will be.
> > But yeah, the value is the semicolon delimited stack all the way to the
> > comm/PID:comm if there are more than one or if the user asks it to be
> > there via a -g keyword, all the other counts/info are just relative to
> > that, CSV or whatever other delimiter the user asks it to, and space is
> > not an option, as we know it can appear in the middle of a COMM:
>
> Yes, I think that we should use a given separator (using -t option)
> instead of hard-coded semicolon. Although it'd be rare, it seems
> possible to use semicolons in the comm name too.
Well, we can have an option to specify what would be the separator for
the callchains.
- Arnaldo
Hi Arnaldo,
On Wed, Nov 04, 2015 at 03:08:58PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Nov 05, 2015 at 12:34:57AM +0900, Namhyung Kim escreveu:
> > Hi Arnaldo and Brendan,
> >
> > On Wed, Nov 04, 2015 at 11:51:31AM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Tue, Nov 03, 2015 at 10:02:32PM -0800, Brendan Gregg escreveu:
> > > > On Tue, Nov 3, 2015 at 5:54 PM, Namhyung Kim <[email protected]> wrote:
> > > > > Ah, makes sense. So it'd look like
> > >
> > > > > $ perf report --stdio -g folded,count,info -F none -s comm
> > > > > $ perf report --stdio -g folded,count,info -F none -s pid
> > >
> > > > > The output would be
> > >
> > > > > 809 swapper-0 cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
> > >
> > > > Thanks, looks almost right: a couple of minor changes:
> > >
> > > > 1. If perf already has the precedent of "PID:comm", instead of my
> > > > "comm-PID", then maybe it should use "PID:comm" for perf consistency.
> > > > Doesn't make much difference to me.
>
> > Right. Actually I'd like to write it that way.. ;-)
>
> Well, those are two pieces of information: "comm" and "pid", so it would
> be nice that we could take this opportunity to remove it, i.e. just
> treat it as any other field and separate it via the designated
> separator, and only show the ones specified.
So do you want to change '-s pid' to print 'PID' part only?
>
> > > > 2. The second space, delimiting "PID:comm" (or comm) and the stack...
> > > > I'm nervous about using space as a delimiter any more than once, since
> > > > it can also appear in comm (eg, "java main") and frames (eg,
> > > > "JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*,
> > > > Thread*)" -- that's direct from "perf script"!). I'd consider making
> > > > it a semicolon:
>
> The C++ symbol names are the biggest challenge here for a single line in
> CSV ("comma" quoted) record :-\
>
> > Fair enough.
>
> > > > 809 swapper-0;cpu_bringup_and_idle;cpu_startup_entry;...
> > >
> > > > So the output is "value key", and key is a semicolon delimited stack
> > > > with an optional comm or PID:comm frame at the start.
> > >
> > > Agreed, but then, we can have some sort of default and also be able to,
> > > using -F, specify what are the fields we want, and in which order, and I
> > > liked your suggestion of being able to specify "-F none" and that mean
> > > no hist line to be produced.
> > >
> > > Likewise, the way that each callchain line should be formatted should be
> > > programmable via the command line, via the -g option, no? Then script
> > > writers could use it in a way that doesn't requires further processing,
> > > as Brendan showed.
> >
> > Right. So '-s <key1>[,<key2>,...] -g info' can control which info is
> > displayed along with the callchains.
>
> So you force the same selection of fields to be used for both the
> hist_entry and the callchains?
Yes.
>
> And why is that some of the fields will be selected via -s (comm, dso)
> and other fields will be selected via -g (count, this "info" thing)?
Because it affects how hist entries are aggregated..
>
> Why not be flexible and allow any set of fields to be used in both
> cases, without one being tied to the other?
>
> I.e. instead of:
>
> -s <key1>[,<key2>,...] -g info
>
> We use:
>
> -s <key1>[,<key2>,...] -g [<keyA>[,<keyB>],...]
But then we need to aggregate hist entries using all of key1, key2,
keyA, keyB and so on. Otherwise callchain info with keyA and keyB
might be stale.
If so, we need to group hist entries again using key1 and key2 only
for printing hist part. For example, entries for (1,2,A,B) and
(1,2,C,D) should be shown as single entry for (1,2).
I think this 'info' part is only needed when hist entries are omitted
(i.e. -F none). If so, no need to bother with new options..
>
> If one would want to have the same set for both, then yeah, a keyword
> for that would be interesting, reusing your "info":
>
> -s <key1>[,<key2>,...] -g info
>
> Would mean:
>
> -s <key1>[,<key2>,...] -g [<key1>[,<key2>],...]
>
> With both ... equal
>
> But "info" is way too vague, perhaps "hist_keys", or something more
> compact, like: "\-s", to reuse the semantic of regular expression groups
> (\1).
I prefer "hist_keys".
>
> > $ perf report -s comm,dso -g folded,count,info -F none
> > 809 swapper;[kernel.vmlinux];cpu_bringup_and_idle;cpu_startup_entry;...
>
> > Note that the info part (swapper;[kernel.vmlinux]) is also separated
> > by a semicolon. But I think it's ok since it's controlled by command
> > line, so script can know how many entries will be.
>
> > > But yeah, the value is the semicolon delimited stack all the way to the
> > > comm/PID:comm if there are more than one or if the user asks it to be
> > > there via a -g keyword, all the other counts/info are just relative to
> > > that, CSV or whatever other delimiter the user asks it to, and space is
> > > not an option, as we know it can appear in the middle of a COMM:
> >
> > Yes, I think that we should use a given separator (using -t option)
> > instead of hard-coded semicolon. Although it'd be rare, it seems
> > possible to use semicolons in the comm name too.
>
> Well, we can have an option to specify what would be the separator for
> the callchains.
What's the problem of using a single separator?
Thanks,
Namhyung