2015-11-02 12:58:13

by Namhyung Kim

[permalink] [raw]
Subject: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)

Hello,

This is what Brendan requested on the perf-users mailing list [1] to
support FlameGraphs [2] more efficiently. This patchset adds a few
more callchain options to adjust the output for it.

At first, 'folded' output mode was added. The folded output puts all
calchain nodes in a line separated by semicolons, a space and the
value. Now it only supports --stdio as other UI provides some way of
folding/expanding callchains dynamically.

The value is now can be one of 'percent', 'period', or 'count'. The
percent is current default output and the period is the raw number of
sample periods. The count is the number of samples for each callchain.

Here's an example:

$ perf report --no-children --show-nr-samples --stdio -g folded,count
...
39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23


$ perf report --no-children --stdio -g percent
...
39.93% swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--28.63%-- start_secondary
|
--11.30%-- rest_init


$ perf report --no-children --stdio --show-total-period -g period
...
39.93% 13018705 swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--9334403-- start_secondary
|
--3684302-- rest_init


$ perf report --no-children --stdio --show-nr-samples -g count
...
39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--57-- start_secondary
|
--23-- rest_init


You can get it from 'perf/callchain-fold-v2' branch on my tree:

git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Any comments are welcome, thanks
Namhyung


[1] http://www.spinics.net/lists/linux-perf-users/msg02498.html
[2] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html


Namhyung Kim (4):
perf report: Support folded callchain mode on --stdio
perf callchain: Abstract callchain print function
perf callchain: Add count fields to struct callchain_node
perf report: Add callchain value option

tools/perf/Documentation/perf-report.txt | 13 +++--
tools/perf/builtin-report.c | 4 +-
tools/perf/ui/browsers/hists.c | 8 +--
tools/perf/ui/gtk/hists.c | 8 +--
tools/perf/ui/stdio/hist.c | 91 ++++++++++++++++++++++++++------
tools/perf/util/callchain.c | 87 +++++++++++++++++++++++++++++-
tools/perf/util/callchain.h | 24 ++++++++-
tools/perf/util/util.c | 3 +-
8 files changed, 204 insertions(+), 34 deletions(-)

--
2.6.2


2015-11-02 12:58:21

by Namhyung Kim

[permalink] [raw]
Subject: [RFC/PATCH v2 1/4] perf report: Support folded callchain mode on --stdio

Add new call chain option (-g) 'folded' to print callchains in a line.
The callchains are separated by semicolons, a space, then the
(absolute) percent values like in 'flat' mode.

For example, following 20 lines can be printed in 3 lines with the
folded output mode;

$ perf report -g flat --no-children | grep -v ^# | head -20
60.48% swapper [kernel.vmlinux] [k] intel_idle
54.60%
intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
start_secondary

5.88%
intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel

$ perf report -g folded --no-children | grep -v ^# | head -3
60.48% swapper [kernel.vmlinux] [k] intel_idle
intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 54.60%
intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel 5.88%

This mode is supported only for --stdio now and intended to be used by
some scripts like in FlameGraphs[1]. Support for other UI might be
added later.

[1] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html

Requested-by: Brendan Gregg <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/ui/stdio/hist.c | 53 +++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/callchain.c | 6 +++++
tools/perf/util/callchain.h | 3 ++-
3 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index dfcbc90146ef..2c7436241912 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -260,6 +260,56 @@ static size_t callchain__fprintf_flat(FILE *fp, struct rb_root *tree,
return ret;
}

+static size_t __callchain__fprintf_folded(FILE *fp, struct callchain_node *node)
+{
+ struct callchain_list *chain;
+ size_t ret = 0;
+ char bf[1024];
+ bool first;
+
+ if (!node)
+ return 0;
+
+ ret += __callchain__fprintf_folded(fp, node->parent);
+
+ first = (ret == 0);
+ list_for_each_entry(chain, &node->val, list) {
+ if (chain->ip >= PERF_CONTEXT_MAX)
+ continue;
+ ret += fprintf(fp, "%s%s", first ? "" : ";",
+ callchain_list__sym_name(chain,
+ bf, sizeof(bf), false));
+ first = false;
+ }
+
+ return ret;
+}
+
+static size_t callchain__fprintf_folded(FILE *fp, struct rb_root *tree,
+ u64 total_samples)
+{
+ size_t ret = 0;
+ u32 entries_printed = 0;
+ struct callchain_node *chain;
+ struct rb_node *rb_node = rb_first(tree);
+
+ while (rb_node) {
+ double percent;
+
+ chain = rb_entry(rb_node, struct callchain_node, rb_node);
+ percent = chain->hit * 100.0 / total_samples;
+
+ ret += __callchain__fprintf_folded(fp, chain);
+ ret += fprintf(fp, " %6.2f%%\n", percent);
+ if (++entries_printed == callchain_param.print_limit)
+ break;
+
+ rb_node = rb_next(rb_node);
+ }
+
+ return ret;
+}
+
static size_t hist_entry_callchain__fprintf(struct hist_entry *he,
u64 total_samples, int left_margin,
FILE *fp)
@@ -278,6 +328,9 @@ static size_t hist_entry_callchain__fprintf(struct hist_entry *he,
case CHAIN_FLAT:
return callchain__fprintf_flat(fp, &he->sorted_chain, total_samples);
break;
+ case CHAIN_FOLDED:
+ return callchain__fprintf_folded(fp, &he->sorted_chain, total_samples);
+ break;
case CHAIN_NONE:
break;
default:
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 735ad48e1858..08cb220ba5ea 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -44,6 +44,10 @@ static int parse_callchain_mode(const char *value)
callchain_param.mode = CHAIN_GRAPH_REL;
return 0;
}
+ if (!strncmp(value, "folded", strlen(value))) {
+ callchain_param.mode = CHAIN_FOLDED;
+ return 0;
+ }
return -1;
}

@@ -218,6 +222,7 @@ rb_insert_callchain(struct rb_root *root, struct callchain_node *chain,

switch (mode) {
case CHAIN_FLAT:
+ case CHAIN_FOLDED:
if (rnode->hit < chain->hit)
p = &(*p)->rb_left;
else
@@ -338,6 +343,7 @@ int callchain_register_param(struct callchain_param *param)
param->sort = sort_chain_graph_rel;
break;
case CHAIN_FLAT:
+ case CHAIN_FOLDED:
param->sort = sort_chain_flat;
break;
case CHAIN_NONE:
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index fce8161e54db..2f305384531f 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -43,7 +43,8 @@ enum chain_mode {
CHAIN_NONE,
CHAIN_FLAT,
CHAIN_GRAPH_ABS,
- CHAIN_GRAPH_REL
+ CHAIN_GRAPH_REL,
+ CHAIN_FOLDED,
};

enum chain_order {
--
2.6.2

2015-11-02 12:59:54

by Namhyung Kim

[permalink] [raw]
Subject: [RFC/PATCH v2 2/4] perf callchain: Abstract callchain print function

This is a preparation to support for printing other type of callchain
value like count or period.

Cc: Brendan Gregg <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/ui/browsers/hists.c | 8 +++++---
tools/perf/ui/gtk/hists.c | 8 ++------
tools/perf/ui/stdio/hist.c | 36 ++++++++++++++++++------------------
tools/perf/util/callchain.c | 25 +++++++++++++++++++++++++
tools/perf/util/callchain.h | 4 ++++
5 files changed, 54 insertions(+), 27 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index e5afb8936040..a8897aab4c4a 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -592,7 +592,6 @@ static int hist_browser__show_callchain(struct hist_browser *browser,
while (node) {
struct callchain_node *child = rb_entry(node, struct callchain_node, rb_node);
struct rb_node *next = rb_next(node);
- u64 cumul = callchain_cumul_hits(child);
struct callchain_list *chain;
char folded_sign = ' ';
int first = true;
@@ -619,9 +618,12 @@ static int hist_browser__show_callchain(struct hist_browser *browser,
browser->show_dso);

if (was_first && need_percent) {
- double percent = cumul * 100.0 / total;
+ char buf[64];

- if (asprintf(&alloc_str, "%2.2f%% %s", percent, str) < 0)
+ callchain_node__sprintf_value(child, buf, sizeof(buf),
+ total);
+
+ if (asprintf(&alloc_str, "%s %s", buf, str) < 0)
str = "Not enough memory!";
else
str = alloc_str;
diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index 4b3585eed1e8..d8037b7023e8 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -100,14 +100,10 @@ static void perf_gtk__add_callchain(struct rb_root *root, GtkTreeStore *store,
struct callchain_list *chain;
GtkTreeIter iter, new_parent;
bool need_new_parent;
- double percent;
- u64 hits, child_total;
+ u64 child_total;

node = rb_entry(nd, struct callchain_node, rb_node);

- hits = callchain_cumul_hits(node);
- percent = 100.0 * hits / total;
-
new_parent = *parent;
need_new_parent = !has_single_node && (node->val_nr > 1);

@@ -116,7 +112,7 @@ static void perf_gtk__add_callchain(struct rb_root *root, GtkTreeStore *store,

gtk_tree_store_append(store, &iter, &new_parent);

- scnprintf(buf, sizeof(buf), "%5.2f%%", percent);
+ callchain_node__sprintf_value(node, buf, sizeof(buf), total);
gtk_tree_store_set(store, &iter, 0, buf, -1);

callchain_list__sym_name(chain, buf, sizeof(buf), false);
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 2c7436241912..e84ca21252d3 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -34,10 +34,10 @@ static size_t ipchain__fprintf_graph_line(FILE *fp, int depth, int depth_mask,
return ret;
}

-static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_list *chain,
+static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_node *node,
+ struct callchain_list *chain,
int depth, int depth_mask, int period,
- u64 total_samples, u64 hits,
- int left_margin)
+ u64 total_samples, int left_margin)
{
int i;
size_t ret = 0;
@@ -50,10 +50,9 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_list *chain,
else
ret += fprintf(fp, " ");
if (!period && i == depth - 1) {
- double percent;
-
- percent = hits * 100.0 / total_samples;
- ret += percent_color_fprintf(fp, "--%2.2f%%-- ", percent);
+ ret += fprintf(fp, "--");
+ ret += callchain_node__fprintf_value(node, fp, total_samples);
+ ret += fprintf(fp, "--");
} else
ret += fprintf(fp, "%s", " ");
}
@@ -120,10 +119,9 @@ static size_t __callchain__fprintf_graph(FILE *fp, struct rb_root *root,
left_margin);
i = 0;
list_for_each_entry(chain, &child->val, list) {
- ret += ipchain__fprintf_graph(fp, chain, depth,
+ ret += ipchain__fprintf_graph(fp, child, chain, depth,
new_depth_mask, i++,
total_samples,
- cumul,
left_margin);
}

@@ -143,14 +141,17 @@ static size_t __callchain__fprintf_graph(FILE *fp, struct rb_root *root,

if (callchain_param.mode == CHAIN_GRAPH_REL &&
remaining && remaining != total_samples) {
+ struct callchain_node rem_node = {
+ .hit = remaining,
+ };

if (!rem_sq_bracket)
return ret;

new_depth_mask &= ~(1 << (depth - 1));
- ret += ipchain__fprintf_graph(fp, &rem_hits, depth,
+ ret += ipchain__fprintf_graph(fp, &rem_node, &rem_hits, depth,
new_depth_mask, 0, total_samples,
- remaining, left_margin);
+ left_margin);
}

return ret;
@@ -243,12 +244,11 @@ static size_t callchain__fprintf_flat(FILE *fp, struct rb_root *tree,
struct rb_node *rb_node = rb_first(tree);

while (rb_node) {
- double percent;
-
chain = rb_entry(rb_node, struct callchain_node, rb_node);
- percent = chain->hit * 100.0 / total_samples;

- ret = percent_color_fprintf(fp, " %6.2f%%\n", percent);
+ ret += fprintf(fp, " ");
+ ret += callchain_node__fprintf_value(chain, fp, total_samples);
+ ret += fprintf(fp, "\n");
ret += __callchain__fprintf_flat(fp, chain, total_samples);
ret += fprintf(fp, "\n");
if (++entries_printed == callchain_param.print_limit)
@@ -294,13 +294,13 @@ static size_t callchain__fprintf_folded(FILE *fp, struct rb_root *tree,
struct rb_node *rb_node = rb_first(tree);

while (rb_node) {
- double percent;

chain = rb_entry(rb_node, struct callchain_node, rb_node);
- percent = chain->hit * 100.0 / total_samples;

ret += __callchain__fprintf_folded(fp, chain);
- ret += fprintf(fp, " %6.2f%%\n", percent);
+ ret += putc(' ', fp);
+ ret += callchain_node__fprintf_value(chain, fp, total_samples);
+ ret += putc('\n', fp);
if (++entries_printed == callchain_param.print_limit)
break;

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 08cb220ba5ea..44184d198855 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -805,6 +805,31 @@ char *callchain_list__sym_name(struct callchain_list *cl,
return bf;
}

+char *callchain_node__sprintf_value(struct callchain_node *node,
+ char *bf, size_t bfsize, u64 total)
+{
+ double percent = 0.0;
+ u64 cumul = callchain_cumul_hits(node);
+
+ if (total)
+ percent = cumul * 100.0 / total;
+
+ scnprintf(bf, bfsize, "%6.2f%%", percent);
+ return bf;
+}
+
+int callchain_node__fprintf_value(struct callchain_node *node,
+ FILE *fp, u64 total)
+{
+ double percent = 0.0;
+ u64 cumul = callchain_cumul_hits(node);
+
+ if (total)
+ percent = cumul * 100.0 / total;
+
+ return percent_color_fprintf(fp, "%.2f%%", percent);
+}
+
static void free_callchain_node(struct callchain_node *node)
{
struct callchain_list *list, *tmp;
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 2f305384531f..3a90a57f6213 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -230,6 +230,10 @@ static inline int arch_skip_callchain_idx(struct thread *thread __maybe_unused,

char *callchain_list__sym_name(struct callchain_list *cl,
char *bf, size_t bfsize, bool show_dso);
+char *callchain_node__sprintf_value(struct callchain_node *node,
+ char *bf, size_t bfsize, u64 total);
+int callchain_node__fprintf_value(struct callchain_node *node,
+ FILE *fp, u64 total);

void free_callchain(struct callchain_root *root);

--
2.6.2

2015-11-02 12:58:31

by Namhyung Kim

[permalink] [raw]
Subject: [RFC/PATCH v2 3/4] perf callchain: Add count fields to struct callchain_node

It's to track the count of occurrences of the callchains.

Cc: Brendan Gregg <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/callchain.c | 10 ++++++++++
tools/perf/util/callchain.h | 7 +++++++
2 files changed, 17 insertions(+)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 44184d198855..0a97d77509bd 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -437,6 +437,8 @@ add_child(struct callchain_node *parent,

new->children_hit = 0;
new->hit = period;
+ new->children_count = 0;
+ new->count = 1;
return new;
}

@@ -484,6 +486,9 @@ split_add_child(struct callchain_node *parent,
parent->children_hit = callchain_cumul_hits(new);
new->val_nr = parent->val_nr - idx_local;
parent->val_nr = idx_local;
+ new->count = parent->count;
+ new->children_count = parent->children_count;
+ parent->children_count = callchain_cumul_counts(new);

/* create a new child for the new branch if any */
if (idx_total < cursor->nr) {
@@ -494,6 +499,8 @@ split_add_child(struct callchain_node *parent,

parent->hit = 0;
parent->children_hit += period;
+ parent->count = 0;
+ parent->children_count += 1;

node = callchain_cursor_current(cursor);
new = add_child(parent, cursor, period);
@@ -516,6 +523,7 @@ split_add_child(struct callchain_node *parent,
rb_insert_color(&new->rb_node_in, &parent->rb_root_in);
} else {
parent->hit = period;
+ parent->count = 1;
}
}

@@ -562,6 +570,7 @@ append_chain_children(struct callchain_node *root,

inc_children_hit:
root->children_hit += period;
+ root->children_count++;
}

static int
@@ -614,6 +623,7 @@ append_chain(struct callchain_node *root,
/* we match 100% of the path, increment the hit */
if (matches == root->val_nr && cursor->pos == cursor->nr) {
root->hit += period;
+ root->count++;
return 0;
}

diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 3a90a57f6213..2f948f0ff034 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -60,6 +60,8 @@ struct callchain_node {
struct rb_root rb_root_in; /* input tree of children */
struct rb_root rb_root; /* sorted output tree of children */
unsigned int val_nr;
+ unsigned int count;
+ unsigned int children_count;
u64 hit;
u64 children_hit;
};
@@ -145,6 +147,11 @@ static inline u64 callchain_cumul_hits(struct callchain_node *node)
return node->hit + node->children_hit;
}

+static inline int callchain_cumul_counts(struct callchain_node *node)
+{
+ return node->count + node->children_count;
+}
+
int callchain_register_param(struct callchain_param *param);
int callchain_append(struct callchain_root *root,
struct callchain_cursor *cursor,
--
2.6.2

2015-11-02 12:59:19

by Namhyung Kim

[permalink] [raw]
Subject: [RFC/PATCH v2 4/4] perf report: Add callchain value option

Now -g/--call-graph option supports how to display callchain values.
Possible values are 'percent', 'period' and 'count'. The percent is
same as before and it's the default behavior. The period displays the
raw period value rather than the percentage. The count displays the
number of occurrences.

$ perf report --no-children --stdio -g percent
...
39.93% swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--28.63%-- start_secondary
|
--11.30%-- rest_init

$ perf report --no-children --show-total-period --stdio -g period
...
39.93% 13018705 swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--9334403-- start_secondary
|
--3684302-- rest_init

$ perf report --no-children --show-nr-samples --stdio -g count
...
39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--57-- start_secondary
|
--23-- rest_init

Cc: Brendan Gregg <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/Documentation/perf-report.txt | 13 ++++---
tools/perf/builtin-report.c | 4 +--
tools/perf/ui/stdio/hist.c | 8 +++++
tools/perf/util/callchain.c | 60 +++++++++++++++++++++++++++-----
tools/perf/util/callchain.h | 10 +++++-
tools/perf/util/util.c | 3 +-
6 files changed, 82 insertions(+), 16 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 5ce8da1e1256..bb9fd23a105e 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -170,11 +170,11 @@ OPTIONS
Dump raw trace in ASCII.

-g::
---call-graph=<print_type,threshold[,print_limit],order,sort_key,branch>::
+--call-graph=<print_type,threshold[,print_limit],order,sort_key[,branch],value>::
Display call chains using type, min percent threshold, print limit,
- call order, sort key and branch. Note that ordering of parameters is not
- fixed so any parement can be given in an arbitraty order. One exception
- is the print_limit which should be preceded by threshold.
+ call order, sort key, optional branch and value. Note that ordering of
+ parameters is not fixed so any parement can be given in an arbitraty order.
+ One exception is the print_limit which should be preceded by threshold.

print_type can be either:
- flat: single column, linear exposure of call chains.
@@ -204,6 +204,11 @@ OPTIONS
- branch: include last branch information in callgraph when available.
Usually more convenient to use --branch-history for this.

+ value can be:
+ - percent: diplay overhead percent (default)
+ - period: display event period
+ - count: display evnt count
+
--children::
Accumulate callchain of children to parent entry so that then can
show up in the output. The output will have a new "Children" column
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 2853ad2bd435..3dd4bb4ded1a 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -625,7 +625,7 @@ parse_percent_limit(const struct option *opt, const char *str,
return 0;
}

-#define CALLCHAIN_DEFAULT_OPT "graph,0.5,caller,function"
+#define CALLCHAIN_DEFAULT_OPT "graph,0.5,caller,function,percent"

const char report_callchain_help[] = "Display call graph (stack chain/backtrace):\n\n"
CALLCHAIN_REPORT_HELP
@@ -708,7 +708,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
OPT_BOOLEAN('x', "exclude-other", &symbol_conf.exclude_other,
"Only display entries with parent-match"),
OPT_CALLBACK_DEFAULT('g', "call-graph", &report,
- "print_type,threshold[,print_limit],order,sort_key[,branch]",
+ "print_type,threshold[,print_limit],order,sort_key[,branch],value",
report_callchain_help, &report_parse_callchain_opt,
callchain_default_opt),
OPT_BOOLEAN(0, "children", &symbol_conf.cumulate_callchain,
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index e84ca21252d3..2104b09d41a8 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -88,6 +88,7 @@ static size_t __callchain__fprintf_graph(FILE *fp, struct rb_root *root,
size_t ret = 0;
int i;
uint entries_printed = 0;
+ int cumul_count = 0;

remaining = total_samples;

@@ -99,6 +100,7 @@ static size_t __callchain__fprintf_graph(FILE *fp, struct rb_root *root,
child = rb_entry(node, struct callchain_node, rb_node);
cumul = callchain_cumul_hits(child);
remaining -= cumul;
+ cumul_count += callchain_cumul_counts(child);

/*
* The depth mask manages the output of pipes that show
@@ -148,6 +150,12 @@ static size_t __callchain__fprintf_graph(FILE *fp, struct rb_root *root,
if (!rem_sq_bracket)
return ret;

+ if (callchain_param.value == CCVAL_COUNT) {
+ rem_node.count = child->parent->children_count - cumul_count;
+ if (rem_node.count <= 0)
+ return ret;
+ }
+
new_depth_mask &= ~(1 << (depth - 1));
ret += ipchain__fprintf_graph(fp, &rem_node, &rem_hits, depth,
new_depth_mask, 0, total_samples,
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 0a97d77509bd..7f0a89584f1b 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -83,6 +83,23 @@ static int parse_callchain_sort_key(const char *value)
return -1;
}

+static int parse_callchain_value(const char *value)
+{
+ if (!strncmp(value, "percent", strlen(value))) {
+ callchain_param.value = CCVAL_PERCENT;
+ return 0;
+ }
+ if (!strncmp(value, "period", strlen(value))) {
+ callchain_param.value = CCVAL_PERIOD;
+ return 0;
+ }
+ if (!strncmp(value, "count", strlen(value))) {
+ callchain_param.value = CCVAL_COUNT;
+ return 0;
+ }
+ return -1;
+}
+
static int
__parse_callchain_report_opt(const char *arg, bool allow_record_opt)
{
@@ -106,7 +123,8 @@ __parse_callchain_report_opt(const char *arg, bool allow_record_opt)

if (!parse_callchain_mode(tok) ||
!parse_callchain_order(tok) ||
- !parse_callchain_sort_key(tok)) {
+ !parse_callchain_sort_key(tok) ||
+ !parse_callchain_value(tok)) {
/* parsing ok - move on to the next */
try_stack_size = false;
goto next;
@@ -819,12 +837,26 @@ char *callchain_node__sprintf_value(struct callchain_node *node,
char *bf, size_t bfsize, u64 total)
{
double percent = 0.0;
- u64 cumul = callchain_cumul_hits(node);
+ u64 period = callchain_cumul_hits(node);
+ int count = callchain_cumul_counts(node);

if (total)
- percent = cumul * 100.0 / total;
+ percent = period * 100.0 / total;
+ if (callchain_param.mode == CHAIN_FOLDED)
+ count = node->count;

- scnprintf(bf, bfsize, "%6.2f%%", percent);
+ switch (callchain_param.value) {
+ case CCVAL_PERIOD:
+ scnprintf(bf, bfsize, "%"PRIu64, period);
+ break;
+ case CCVAL_COUNT:
+ scnprintf(bf, bfsize, "%u", count);
+ break;
+ case CCVAL_PERCENT:
+ default:
+ scnprintf(bf, bfsize, "%.2f%%", percent);
+ break;
+ }
return bf;
}

@@ -832,12 +864,24 @@ int callchain_node__fprintf_value(struct callchain_node *node,
FILE *fp, u64 total)
{
double percent = 0.0;
- u64 cumul = callchain_cumul_hits(node);
+ u64 period = callchain_cumul_hits(node);
+ int count = callchain_cumul_counts(node);

if (total)
- percent = cumul * 100.0 / total;
-
- return percent_color_fprintf(fp, "%.2f%%", percent);
+ percent = period * 100.0 / total;
+ if (callchain_param.mode == CHAIN_FOLDED)
+ count = node->count;
+
+ switch (callchain_param.value) {
+ case CCVAL_PERIOD:
+ return fprintf(fp, "%"PRIu64, period);
+ case CCVAL_COUNT:
+ return fprintf(fp, "%u", count);
+ case CCVAL_PERCENT:
+ default:
+ return percent_color_fprintf(fp, "%.2f%%", percent);
+ }
+ return 0;
}

static void free_callchain_node(struct callchain_node *node)
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 2f948f0ff034..e8533e328a47 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -29,7 +29,8 @@
HELP_PAD "print_limit:\tmaximum number of call graph entry (<number>)\n" \
HELP_PAD "order:\t\tcall graph order (caller|callee)\n" \
HELP_PAD "sort_key:\tcall graph sort key (function|address)\n" \
- HELP_PAD "branch:\t\tinclude last branch info to call graph (branch)\n"
+ HELP_PAD "branch:\t\tinclude last branch info to call graph (branch)\n" \
+ HELP_PAD "value:\t\tcall graph value (percent|period|count)\n"

enum perf_call_graph_mode {
CALLCHAIN_NONE,
@@ -81,6 +82,12 @@ enum chain_key {
CCKEY_ADDRESS
};

+enum chain_value {
+ CCVAL_PERCENT,
+ CCVAL_PERIOD,
+ CCVAL_COUNT,
+};
+
struct callchain_param {
bool enabled;
enum perf_call_graph_mode record_mode;
@@ -93,6 +100,7 @@ struct callchain_param {
bool order_set;
enum chain_key key;
bool branch_callstack;
+ enum chain_value value;
};

extern struct callchain_param callchain_param;
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index cd12c25e4ea4..174912f87913 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -20,7 +20,8 @@ struct callchain_param callchain_param = {
.mode = CHAIN_GRAPH_ABS,
.min_percent = 0.5,
.order = ORDER_CALLEE,
- .key = CCKEY_FUNCTION
+ .key = CCKEY_FUNCTION,
+ .value = CCVAL_PERCENT,
};

/*
--
2.6.2

2015-11-02 20:38:03

by Brendan Gregg

[permalink] [raw]
Subject: Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)

G'Day Namhyung,

On Mon, Nov 2, 2015 at 4:57 AM, Namhyung Kim <[email protected]> wrote:
> Hello,
>
> This is what Brendan requested on the perf-users mailing list [1] to
> support FlameGraphs [2] more efficiently. This patchset adds a few
> more callchain options to adjust the output for it.
>
> At first, 'folded' output mode was added. The folded output puts all
> calchain nodes in a line separated by semicolons, a space and the
> value. Now it only supports --stdio as other UI provides some way of
> folding/expanding callchains dynamically.
>
> The value is now can be one of 'percent', 'period', or 'count'. The
> percent is current default output and the period is the raw number of
> sample periods. The count is the number of samples for each callchain.
>
> Here's an example:
>
> $ perf report --no-children --show-nr-samples --stdio -g folded,count
> ...
> 39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
> intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
> intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23

Thanks!

So for the folded output I don't need the summary line (the row of
columns printed by hist_entry__snprintf()), and don't need anything
except folded stacks and the counts. If working with the existing
stdio interface is making it harder than it needs to be, might it be
easier to make it a separate interface (ui/folded), that just emitted
the folded output? Just an idea. This existing patchset is working for
me, I'd just be filtering the output.

Having the option for percentages and periods is nice. I can envisage
using periods (for latency flame graphs).

Brendan

2015-11-02 21:30:26

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)

Em Mon, Nov 02, 2015 at 12:37:28PM -0800, Brendan Gregg escreveu:
> G'Day Namhyung,
>
> On Mon, Nov 2, 2015 at 4:57 AM, Namhyung Kim <[email protected]> wrote:
> > Hello,
> >
> > This is what Brendan requested on the perf-users mailing list [1] to
> > support FlameGraphs [2] more efficiently. This patchset adds a few
> > more callchain options to adjust the output for it.
> >
> > At first, 'folded' output mode was added. The folded output puts all
> > calchain nodes in a line separated by semicolons, a space and the
> > value. Now it only supports --stdio as other UI provides some way of
> > folding/expanding callchains dynamically.
> >
> > The value is now can be one of 'percent', 'period', or 'count'. The
> > percent is current default output and the period is the raw number of
> > sample periods. The count is the number of samples for each callchain.
> >
> > Here's an example:
> >
> > $ perf report --no-children --show-nr-samples --stdio -g folded,count
> > ...
> > 39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
> > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
> > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23
>
> Thanks!
>
> So for the folded output I don't need the summary line (the row of
> columns printed by hist_entry__snprintf()), and don't need anything
> except folded stacks and the counts. If working with the existing
> stdio interface is making it harder than it needs to be, might it be

I don't think it so, just add some flag asking for that
hist_entry__snprintf() to be supressed, ideas for a long option name?

Having it as Namhyung did may have value for some people as a more
compact way to show the callchains together with the hist_entry line.

With this in mind, do you have any other issues with Namhyung's
patchkit? An acked-by/tested-by you would be nice to have, and then we
could work out the new option to suppress that hist_entry__snprintf()
in a follow up patch.

> easier to make it a separate interface (ui/folded), that just emitted
> the folded output? Just an idea. This existing patchset is working for
> me, I'd just be filtering the output.
>
> Having the option for percentages and periods is nice. I can envisage
> using periods (for latency flame graphs).

You mean in the callchain lines?

- Arnaldo

2015-11-02 22:12:24

by Namhyung Kim

[permalink] [raw]
Subject: Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)

Hi Arnaldo,

On Mon, Nov 02, 2015 at 06:30:21PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Nov 02, 2015 at 12:37:28PM -0800, Brendan Gregg escreveu:
> > G'Day Namhyung,
> >
> > On Mon, Nov 2, 2015 at 4:57 AM, Namhyung Kim <[email protected]> wrote:
> > > Hello,
> > >
> > > This is what Brendan requested on the perf-users mailing list [1] to
> > > support FlameGraphs [2] more efficiently. This patchset adds a few
> > > more callchain options to adjust the output for it.
> > >
> > > At first, 'folded' output mode was added. The folded output puts all
> > > calchain nodes in a line separated by semicolons, a space and the
> > > value. Now it only supports --stdio as other UI provides some way of
> > > folding/expanding callchains dynamically.
> > >
> > > The value is now can be one of 'percent', 'period', or 'count'. The
> > > percent is current default output and the period is the raw number of
> > > sample periods. The count is the number of samples for each callchain.
> > >
> > > Here's an example:
> > >
> > > $ perf report --no-children --show-nr-samples --stdio -g folded,count
> > > ...
> > > 39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
> > > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
> > > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23
> >
> > Thanks!
> >
> > So for the folded output I don't need the summary line (the row of
> > columns printed by hist_entry__snprintf()), and don't need anything
> > except folded stacks and the counts. If working with the existing
> > stdio interface is making it harder than it needs to be, might it be
>
> I don't think it so, just add some flag asking for that
> hist_entry__snprintf() to be supressed, ideas for a long option name?
>
> Having it as Namhyung did may have value for some people as a more
> compact way to show the callchains together with the hist_entry line.

Yeah, I'd keep the hist entry line unless it's too hard to
parse/filter. IMHO it's just a way to show callchains, so no need to
have separate output mode..

Brendan, I guess you still need to know other info like cpu or pid, no?

And I feel like it'd be better to put the count before the callchains
for consistency like below. Is it OK to you?

$ perf report --no-children --show-nr-samples --stdio -g folded,count
...
39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
57 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
23 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;...


>
> With this in mind, do you have any other issues with Namhyung's
> patchkit? An acked-by/tested-by you would be nice to have, and then we
> could work out the new option to suppress that hist_entry__snprintf()
> in a follow up patch.
>
> > easier to make it a separate interface (ui/folded), that just emitted
> > the folded output? Just an idea. This existing patchset is working for
> > me, I'd just be filtering the output.
> >
> > Having the option for percentages and periods is nice. I can envisage
> > using periods (for latency flame graphs).

Glad to see you like it. :)

Thanks,
Namhyung

2015-11-02 22:28:50

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)

Hi Namhyung,

Em Tue, Nov 03, 2015 at 07:12:04AM +0900, Namhyung Kim escreveu:
> On Mon, Nov 02, 2015 at 06:30:21PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Mon, Nov 02, 2015 at 12:37:28PM -0800, Brendan Gregg escreveu:
> > > On Mon, Nov 2, 2015 at 4:57 AM, Namhyung Kim <[email protected]> wrote:
> > > > This is what Brendan requested on the perf-users mailing list [1] to
> > > > support FlameGraphs [2] more efficiently. This patchset adds a few
> > > > more callchain options to adjust the output for it.

> > > > At first, 'folded' output mode was added. The folded output puts all
> > > > calchain nodes in a line separated by semicolons, a space and the
> > > > value. Now it only supports --stdio as other UI provides some way of
> > > > folding/expanding callchains dynamically.

> > > > The value is now can be one of 'percent', 'period', or 'count'. The
> > > > percent is current default output and the period is the raw number of
> > > > sample periods. The count is the number of samples for each callchain.

> > > > Here's an example:

> > > > $ perf report --no-children --show-nr-samples --stdio -g folded,count
> > > > ...
> > > > 39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
> > > > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
> > > > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23

> > > So for the folded output I don't need the summary line (the row of
> > > columns printed by hist_entry__snprintf()), and don't need anything
> > > except folded stacks and the counts. If working with the existing
> > > stdio interface is making it harder than it needs to be, might it be

> > I don't think it so, just add some flag asking for that
> > hist_entry__snprintf() to be supressed, ideas for a long option name?

> > Having it as Namhyung did may have value for some people as a more
> > compact way to show the callchains together with the hist_entry line.

> Yeah, I'd keep the hist entry line unless it's too hard to
> parse/filter. IMHO it's just a way to show callchains, so no need to

What I suggested was to have something like:

$ perf report --no-children --no-hists --stdio -g folded,count
^^^^^^^^^^
^^^^^^^^^^
...
intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23

I.e. the first entry in the callchain is 'intel_idle', just like in what
Brendan called the 'summary line', i.e. reduntant when what he wants its
just all the callchains and how many times they were sampled.

> have separate output mode..

> Brendan, I guess you still need to know other info like cpu or pid, no?

Possibly, but just with the callchains he has enough info for the basic
flame graph, no?

> And I feel like it'd be better to put the count before the callchains
> for consistency like below. Is it OK to you?

Consistency with what?

The main thing here is the callchain, all the other stuff are things
related to it, so showing it first makes sense to me.

Having some way to list the desired info to have for each callchain may
be interesting, and if he could do it like:

-g folded,count,cpu,other,fields

then he would know how to parse the per-callchain info at the end of
each line, right?

- Arnaldo

>
> $ perf report --no-children --show-nr-samples --stdio -g folded,count
> ...
> 39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
> 57 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
> 23 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;...
>
>
> >
> > With this in mind, do you have any other issues with Namhyung's
> > patchkit? An acked-by/tested-by you would be nice to have, and then we
> > could work out the new option to suppress that hist_entry__snprintf()
> > in a follow up patch.
> >
> > > easier to make it a separate interface (ui/folded), that just emitted
> > > the folded output? Just an idea. This existing patchset is working for
> > > me, I'd just be filtering the output.
> > >
> > > Having the option for percentages and periods is nice. I can envisage
> > > using periods (for latency flame graphs).
>
> Glad to see you like it. :)
>
> Thanks,
> Namhyung

2015-11-02 22:43:36

by Brendan Gregg

[permalink] [raw]
Subject: Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)

On Mon, Nov 2, 2015 at 2:12 PM, Namhyung Kim <[email protected]> wrote:
> Hi Arnaldo,
>
> On Mon, Nov 02, 2015 at 06:30:21PM -0300, Arnaldo Carvalho de Melo wrote:
>> Em Mon, Nov 02, 2015 at 12:37:28PM -0800, Brendan Gregg escreveu:
>> > G'Day Namhyung,
>> >
>> > On Mon, Nov 2, 2015 at 4:57 AM, Namhyung Kim <[email protected]> wrote:
>> > > Hello,
>> > >
>> > > This is what Brendan requested on the perf-users mailing list [1] to
>> > > support FlameGraphs [2] more efficiently. This patchset adds a few
>> > > more callchain options to adjust the output for it.
>> > >
>> > > At first, 'folded' output mode was added. The folded output puts all
>> > > calchain nodes in a line separated by semicolons, a space and the
>> > > value. Now it only supports --stdio as other UI provides some way of
>> > > folding/expanding callchains dynamically.
>> > >
>> > > The value is now can be one of 'percent', 'period', or 'count'. The
>> > > percent is current default output and the period is the raw number of
>> > > sample periods. The count is the number of samples for each callchain.
>> > >
>> > > Here's an example:
>> > >
>> > > $ perf report --no-children --show-nr-samples --stdio -g folded,count
>> > > ...
>> > > 39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
>> > > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
>> > > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23
>> >
>> > Thanks!
>> >
>> > So for the folded output I don't need the summary line (the row of
>> > columns printed by hist_entry__snprintf()), and don't need anything
>> > except folded stacks and the counts. If working with the existing
>> > stdio interface is making it harder than it needs to be, might it be
>>
>> I don't think it so, just add some flag asking for that
>> hist_entry__snprintf() to be supressed, ideas for a long option name?
>>
>> Having it as Namhyung did may have value for some people as a more
>> compact way to show the callchains together with the hist_entry line.
>
> Yeah, I'd keep the hist entry line unless it's too hard to
> parse/filter. IMHO it's just a way to show callchains, so no need to
> have separate output mode..

Ok, good point, it can be thought of as a different stack representation format.

>
> Brendan, I guess you still need to know other info like cpu or pid, no?
>

Yes, I just realized that I either include the process name (Command
column) or name-PID, as the first folded element. Eg, output can be:

mkdir;getopt_long;page_fault;do_page_fault;__do_page_fault;filemap_map_pages 3

Or:

mkdir-21918;getopt_long;page_fault;do_page_fault;__do_page_fault;filemap_map_pages
2

Usually the first, but sometimes it's helpful to split on PID as well.

As for what to call such options (which may be a follow on patch
anyway) ... maybe something like:

"folded": fold stacks as single lines
"nameonly,folded": suppress summary line and include process name in
the folded stack
"pidonly,folded": suppress summary line and include process_name-PID
in the folded stack

> And I feel like it'd be better to put the count before the callchains
> for consistency like below. Is it OK to you?
>
> $ perf report --no-children --show-nr-samples --stdio -g folded,count
> ...
> 39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
> 57 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
> 23 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;...
>

If it was printing with the perf report summary, sure, but if we have
a way to only emit folded output, then counts last would be perfect
and maybe a bit more intuitive (key then value).

>
>>
>> With this in mind, do you have any other issues with Namhyung's
>> patchkit? An acked-by/tested-by you would be nice to have, and then we
>> could work out the new option to suppress that hist_entry__snprintf()
>> in a follow up patch.

Acked and tested, yes.

Looks like I'd be using caller ordering, eg, to get lines like this:

__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;check_events;xen_hypercall_xen_version
91

Which I can do just by using "-g folded,count,caller".

>>
>> > easier to make it a separate interface (ui/folded), that just emitted
>> > the folded output? Just an idea. This existing patchset is working for
>> > me, I'd just be filtering the output.
>> >
>> > Having the option for percentages and periods is nice. I can envisage
>> > using periods (for latency flame graphs).
>
> Glad to see you like it. :)
>
> Thanks,
> Namhyung

2015-11-02 22:49:44

by Namhyung Kim

[permalink] [raw]
Subject: Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)

On Mon, Nov 02, 2015 at 07:28:42PM -0300, Arnaldo Carvalho de Melo wrote:
> Hi Namhyung,
>
> Em Tue, Nov 03, 2015 at 07:12:04AM +0900, Namhyung Kim escreveu:
> > On Mon, Nov 02, 2015 at 06:30:21PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Mon, Nov 02, 2015 at 12:37:28PM -0800, Brendan Gregg escreveu:
> > > > On Mon, Nov 2, 2015 at 4:57 AM, Namhyung Kim <[email protected]> wrote:
> > > > > This is what Brendan requested on the perf-users mailing list [1] to
> > > > > support FlameGraphs [2] more efficiently. This patchset adds a few
> > > > > more callchain options to adjust the output for it.
>
> > > > > At first, 'folded' output mode was added. The folded output puts all
> > > > > calchain nodes in a line separated by semicolons, a space and the
> > > > > value. Now it only supports --stdio as other UI provides some way of
> > > > > folding/expanding callchains dynamically.
>
> > > > > The value is now can be one of 'percent', 'period', or 'count'. The
> > > > > percent is current default output and the period is the raw number of
> > > > > sample periods. The count is the number of samples for each callchain.
>
> > > > > Here's an example:
>
> > > > > $ perf report --no-children --show-nr-samples --stdio -g folded,count
> > > > > ...
> > > > > 39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
> > > > > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
> > > > > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23
>
> > > > So for the folded output I don't need the summary line (the row of
> > > > columns printed by hist_entry__snprintf()), and don't need anything
> > > > except folded stacks and the counts. If working with the existing
> > > > stdio interface is making it harder than it needs to be, might it be
>
> > > I don't think it so, just add some flag asking for that
> > > hist_entry__snprintf() to be supressed, ideas for a long option name?
>
> > > Having it as Namhyung did may have value for some people as a more
> > > compact way to show the callchains together with the hist_entry line.
>
> > Yeah, I'd keep the hist entry line unless it's too hard to
> > parse/filter. IMHO it's just a way to show callchains, so no need to
>
> What I suggested was to have something like:
>
> $ perf report --no-children --no-hists --stdio -g folded,count
> ^^^^^^^^^^
> ^^^^^^^^^^
> ...
> intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
> intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23
>
> I.e. the first entry in the callchain is 'intel_idle', just like in what
> Brendan called the 'summary line', i.e. reduntant when what he wants its
> just all the callchains and how many times they were sampled.

Yep, I know. But isn't 'perf report' all for seeing hist lines? :)

I'm not insisting it strongly, but it's a bit strange for me if perf
report doesn't show any hist lines..


>
> > have separate output mode..
>
> > Brendan, I guess you still need to know other info like cpu or pid, no?
>
> Possibly, but just with the callchains he has enough info for the basic
> flame graph, no?
>
> > And I feel like it'd be better to put the count before the callchains
> > for consistency like below. Is it OK to you?
>
> Consistency with what?

Oh, I meant consistency with other callchain output style like graph,
fractal or flat - They all show the numbers before callchains. And I
think it's easier to read for human. :)


>
> The main thing here is the callchain, all the other stuff are things
> related to it, so showing it first makes sense to me.
>
> Having some way to list the desired info to have for each callchain may
> be interesting, and if he could do it like:
>
> -g folded,count,cpu,other,fields
>
> then he would know how to parse the per-callchain info at the end of
> each line, right?

Hmm.. looks like that it ends up having redundant info. I don't think
it's generally useful to other 'perf report' stuffs. Wouldn't it be
better just adding minimal support and let the external tool parse the
output?

Thanks,
Namhyung

2015-11-02 23:04:45

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)

Em Tue, Nov 03, 2015 at 07:49:27AM +0900, Namhyung Kim escreveu:
> On Mon, Nov 02, 2015 at 07:28:42PM -0300, Arnaldo Carvalho de Melo wrote:
> > Hi Namhyung,
> >
> > Em Tue, Nov 03, 2015 at 07:12:04AM +0900, Namhyung Kim escreveu:
> > > On Mon, Nov 02, 2015 at 06:30:21PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > Em Mon, Nov 02, 2015 at 12:37:28PM -0800, Brendan Gregg escreveu:
> > > > > On Mon, Nov 2, 2015 at 4:57 AM, Namhyung Kim <[email protected]> wrote:
> > > > > > This is what Brendan requested on the perf-users mailing list [1] to
> > > > > > support FlameGraphs [2] more efficiently. This patchset adds a few
> > > > > > more callchain options to adjust the output for it.
> >
> > > > > > At first, 'folded' output mode was added. The folded output puts all
> > > > > > calchain nodes in a line separated by semicolons, a space and the
> > > > > > value. Now it only supports --stdio as other UI provides some way of
> > > > > > folding/expanding callchains dynamically.
> >
> > > > > > The value is now can be one of 'percent', 'period', or 'count'. The
> > > > > > percent is current default output and the period is the raw number of
> > > > > > sample periods. The count is the number of samples for each callchain.
> >
> > > > > > Here's an example:
> >
> > > > > > $ perf report --no-children --show-nr-samples --stdio -g folded,count
> > > > > > ...
> > > > > > 39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
> > > > > > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
> > > > > > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23
> >
> > > > > So for the folded output I don't need the summary line (the row of
> > > > > columns printed by hist_entry__snprintf()), and don't need anything
> > > > > except folded stacks and the counts. If working with the existing
> > > > > stdio interface is making it harder than it needs to be, might it be
> >
> > > > I don't think it so, just add some flag asking for that
> > > > hist_entry__snprintf() to be supressed, ideas for a long option name?
> >
> > > > Having it as Namhyung did may have value for some people as a more
> > > > compact way to show the callchains together with the hist_entry line.
> >
> > > Yeah, I'd keep the hist entry line unless it's too hard to
> > > parse/filter. IMHO it's just a way to show callchains, so no need to
> >
> > What I suggested was to have something like:
> >
> > $ perf report --no-children --no-hists --stdio -g folded,count
> > ^^^^^^^^^^
> > ^^^^^^^^^^
> > ...
> > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
> > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23
> >
> > I.e. the first entry in the callchain is 'intel_idle', just like in what
> > Brendan called the 'summary line', i.e. reduntant when what he wants its
> > just all the callchains and how many times they were sampled.
>
> Yep, I know. But isn't 'perf report' all for seeing hist lines? :)

Well, so far, yes, but he is presenting a usecase where what we want to
see is just callchains, and we can achieve that rather easily, no?

> I'm not insisting it strongly, but it's a bit strange for me if perf
> report doesn't show any hist lines..

If that is of no use in this use case, why not?

> > > have separate output mode..
> >
> > > Brendan, I guess you still need to know other info like cpu or pid, no?
> >
> > Possibly, but just with the callchains he has enough info for the basic
> > flame graph, no?
> >
> > > And I feel like it'd be better to put the count before the callchains
> > > for consistency like below. Is it OK to you?
> >
> > Consistency with what?
>
> Oh, I meant consistency with other callchain output style like graph,
> fractal or flat - They all show the numbers before callchains. And I
> think it's easier to read for human. :)

Well, As I said, isn't the main object here the callchain? :-)

And Brendan's request is for a something to be consumed by scripts, i.e.
something like we have for perf stat:

For humans:

[root@felicio ~]# perf stat -e cycles -I 1000 -a
# time counts unit events
1.000304391 1,820,038 cycles
2.000490191 1,005,477,007 cycles
3.000657813 1,717,007 cycles
^C 3.917890293 2,804,034 cycles

For machines/scripts:

[root@felicio ~]# perf stat -x, -e cycles -I 1000 -a
1.000291954,1923360,,cycles,3998167210,100.00
2.000477154,1005608105,,cycles,3998475482,100.00
3.000612612,1345483,,cycles,3998332391,100.00
4.000744469,1005046913,,cycles,3998258199,100.00
^C 4.331684347,1551327,,cycles,3463190970,100.00

[root@felicio ~]#


> > The main thing here is the callchain, all the other stuff are things
> > related to it, so showing it first makes sense to me.
> >
> > Having some way to list the desired info to have for each callchain may
> > be interesting, and if he could do it like:
> >
> > -g folded,count,cpu,other,fields
> >
> > then he would know how to parse the per-callchain info at the end of
> > each line, right?
>
> Hmm.. looks like that it ends up having redundant info. I don't think

What is redundant, and with with what?

> it's generally useful to other 'perf report' stuffs. Wouldn't it be
> better just adding minimal support and let the external tool parse the
> output?

Oh well, perhaps we could have a 'perf callchain' tool that would be
centered on callchains and would provided one line per callchain, which
would have:

callchain;seprarated;colons series,of,desired,fields,for,this,callchain

Which would reuse heavily the 'perf report' / 'perf top' code for
histograms, no?

I still think that this is a 'perf report' thing, but one that is
centered in callchains, and that is to be consumed by scripts, not
humans.

- Arnaldo

2015-11-02 23:46:21

by Namhyung Kim

[permalink] [raw]
Subject: Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)

On Mon, Nov 02, 2015 at 08:04:36PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Nov 03, 2015 at 07:49:27AM +0900, Namhyung Kim escreveu:
> > On Mon, Nov 02, 2015 at 07:28:42PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Hi Namhyung,
> > >
> > > Em Tue, Nov 03, 2015 at 07:12:04AM +0900, Namhyung Kim escreveu:
> > > > On Mon, Nov 02, 2015 at 06:30:21PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > Em Mon, Nov 02, 2015 at 12:37:28PM -0800, Brendan Gregg escreveu:
> > > > > > On Mon, Nov 2, 2015 at 4:57 AM, Namhyung Kim <[email protected]> wrote:
> > > > > > > This is what Brendan requested on the perf-users mailing list [1] to
> > > > > > > support FlameGraphs [2] more efficiently. This patchset adds a few
> > > > > > > more callchain options to adjust the output for it.
> > >
> > > > > > > At first, 'folded' output mode was added. The folded output puts all
> > > > > > > calchain nodes in a line separated by semicolons, a space and the
> > > > > > > value. Now it only supports --stdio as other UI provides some way of
> > > > > > > folding/expanding callchains dynamically.
> > >
> > > > > > > The value is now can be one of 'percent', 'period', or 'count'. The
> > > > > > > percent is current default output and the period is the raw number of
> > > > > > > sample periods. The count is the number of samples for each callchain.
> > >
> > > > > > > Here's an example:
> > >
> > > > > > > $ perf report --no-children --show-nr-samples --stdio -g folded,count
> > > > > > > ...
> > > > > > > 39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
> > > > > > > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
> > > > > > > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23
> > >
> > > > > > So for the folded output I don't need the summary line (the row of
> > > > > > columns printed by hist_entry__snprintf()), and don't need anything
> > > > > > except folded stacks and the counts. If working with the existing
> > > > > > stdio interface is making it harder than it needs to be, might it be
> > >
> > > > > I don't think it so, just add some flag asking for that
> > > > > hist_entry__snprintf() to be supressed, ideas for a long option name?
> > >
> > > > > Having it as Namhyung did may have value for some people as a more
> > > > > compact way to show the callchains together with the hist_entry line.
> > >
> > > > Yeah, I'd keep the hist entry line unless it's too hard to
> > > > parse/filter. IMHO it's just a way to show callchains, so no need to
> > >
> > > What I suggested was to have something like:
> > >
> > > $ perf report --no-children --no-hists --stdio -g folded,count
> > > ^^^^^^^^^^
> > > ^^^^^^^^^^
> > > ...
> > > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
> > > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23
> > >
> > > I.e. the first entry in the callchain is 'intel_idle', just like in what
> > > Brendan called the 'summary line', i.e. reduntant when what he wants its
> > > just all the callchains and how many times they were sampled.
> >
> > Yep, I know. But isn't 'perf report' all for seeing hist lines? :)
>
> Well, so far, yes, but he is presenting a usecase where what we want to
> see is just callchains, and we can achieve that rather easily, no?

But it's also easy to filter from the script side.


>
> > I'm not insisting it strongly, but it's a bit strange for me if perf
> > report doesn't show any hist lines..
>
> If that is of no use in this use case, why not?

Well, I think FlameGraphs is a rather unusual case and folded output
seems useful to other use cases too.


>
> > > > have separate output mode..
> > >
> > > > Brendan, I guess you still need to know other info like cpu or pid, no?
> > >
> > > Possibly, but just with the callchains he has enough info for the basic
> > > flame graph, no?
> > >
> > > > And I feel like it'd be better to put the count before the callchains
> > > > for consistency like below. Is it OK to you?
> > >
> > > Consistency with what?
> >
> > Oh, I meant consistency with other callchain output style like graph,
> > fractal or flat - They all show the numbers before callchains. And I
> > think it's easier to read for human. :)
>
> Well, As I said, isn't the main object here the callchain? :-)
>
> And Brendan's request is for a something to be consumed by scripts, i.e.
> something like we have for perf stat:
>
> For humans:
>
> [root@felicio ~]# perf stat -e cycles -I 1000 -a
> # time counts unit events
> 1.000304391 1,820,038 cycles
> 2.000490191 1,005,477,007 cycles
> 3.000657813 1,717,007 cycles
> ^C 3.917890293 2,804,034 cycles
>
> For machines/scripts:
>
> [root@felicio ~]# perf stat -x, -e cycles -I 1000 -a
> 1.000291954,1923360,,cycles,3998167210,100.00
> 2.000477154,1005608105,,cycles,3998475482,100.00
> 3.000612612,1345483,,cycles,3998332391,100.00
> 4.000744469,1005046913,,cycles,3998258199,100.00
> ^C 4.331684347,1551327,,cycles,3463190970,100.00
>
> [root@felicio ~]#

Yes, I thought about it too. Maybe -t/--field-separator option can be
used to separate folded callchains too.

>
>
> > > The main thing here is the callchain, all the other stuff are things
> > > related to it, so showing it first makes sense to me.
> > >
> > > Having some way to list the desired info to have for each callchain may
> > > be interesting, and if he could do it like:
> > >
> > > -g folded,count,cpu,other,fields
> > >
> > > then he would know how to parse the per-callchain info at the end of
> > > each line, right?
> >
> > Hmm.. looks like that it ends up having redundant info. I don't think
>
> What is redundant, and with with what?

When it's used with normal perf report cases, those other info in
callchain lines are redundant to hist lines. Also if a hist entry has
many callchains, each callchain lines will have same info in other fields.


>
> > it's generally useful to other 'perf report' stuffs. Wouldn't it be
> > better just adding minimal support and let the external tool parse the
> > output?
>
> Oh well, perhaps we could have a 'perf callchain' tool that would be
> centered on callchains and would provided one line per callchain, which
> would have:
>
> callchain;seprarated;colons series,of,desired,fields,for,this,callchain
>
> Which would reuse heavily the 'perf report' / 'perf top' code for
> histograms, no?

I guess the callchain code is pretty isolated or can be isolated
easily though.


>
> I still think that this is a 'perf report' thing, but one that is
> centered in callchains, and that is to be consumed by scripts, not
> humans.

Agreed.

I'm just looking for a way to support it with minimal change. :)

Thanks,
Namhyung

2015-11-03 00:46:57

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)

Em Tue, Nov 03, 2015 at 08:46:06AM +0900, Namhyung Kim escreveu:
> On Mon, Nov 02, 2015 at 08:04:36PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Nov 03, 2015 at 07:49:27AM +0900, Namhyung Kim escreveu:
> > > On Mon, Nov 02, 2015 at 07:28:42PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > Em Tue, Nov 03, 2015 at 07:12:04AM +0900, Namhyung Kim escreveu:
> > > > > On Mon, Nov 02, 2015 at 06:30:21PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > > Em Mon, Nov 02, 2015 at 12:37:28PM -0800, Brendan Gregg escreveu:
> > > > > > > On Mon, Nov 2, 2015 at 4:57 AM, Namhyung Kim <[email protected]> wrote:
> > > > > > > > This is what Brendan requested on the perf-users mailing list [1] to
> > > > > > > > support FlameGraphs [2] more efficiently. This patchset adds a few
> > > > > > > > more callchain options to adjust the output for it.
> > > >
> > > > > > > > At first, 'folded' output mode was added. The folded output puts all
> > > > > > > > calchain nodes in a line separated by semicolons, a space and the
> > > > > > > > value. Now it only supports --stdio as other UI provides some way of
> > > > > > > > folding/expanding callchains dynamically.
> > > >
> > > > > > > > The value is now can be one of 'percent', 'period', or 'count'. The
> > > > > > > > percent is current default output and the period is the raw number of
> > > > > > > > sample periods. The count is the number of samples for each callchain.
> > > >
> > > > > > > > Here's an example:
> > > >
> > > > > > > > $ perf report --no-children --show-nr-samples --stdio -g folded,count
> > > > > > > > ...
> > > > > > > > 39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
> > > > > > > > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
> > > > > > > > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23

> > > > > > > So for the folded output I don't need the summary line (the row of
> > > > > > > columns printed by hist_entry__snprintf()), and don't need anything
> > > > > > > except folded stacks and the counts. If working with the existing
> > > > > > > stdio interface is making it harder than it needs to be, might it be

> > > > > > I don't think it so, just add some flag asking for that
> > > > > > hist_entry__snprintf() to be supressed, ideas for a long option name?

> > > > > > Having it as Namhyung did may have value for some people as a more
> > > > > > compact way to show the callchains together with the hist_entry line.

> > > > > Yeah, I'd keep the hist entry line unless it's too hard to
> > > > > parse/filter. IMHO it's just a way to show callchains, so no need to

> > > > What I suggested was to have something like:

> > > > $ perf report --no-children --no-hists --stdio -g folded,count
> > > > ^^^^^^^^^^
> > > > ^^^^^^^^^^
> > > > ...
> > > > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
> > > > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23
> > > >
> > > > I.e. the first entry in the callchain is 'intel_idle', just like in what
> > > > Brendan called the 'summary line', i.e. reduntant when what he wants its
> > > > just all the callchains and how many times they were sampled.

> > > Yep, I know. But isn't 'perf report' all for seeing hist lines? :)

> > Well, so far, yes, but he is presenting a usecase where what we want to
> > see is just callchains, and we can achieve that rather easily, no?

> But it's also easy to filter from the script side.

Why not go all the way and provide just what the script wants?

> > > I'm not insisting it strongly, but it's a bit strange for me if perf
> > > report doesn't show any hist lines..
> >
> > If that is of no use in this use case, why not?
>
> Well, I think FlameGraphs is a rather unusual case and folded output
> seems useful to other use cases too.

Sure thing, I agreed with that, its just one flag to tell if the
hist_entry__snprintf should be used or not.

> > > > > have separate output mode..
> > > >
> > > > > Brendan, I guess you still need to know other info like cpu or pid, no?
> > > >
> > > > Possibly, but just with the callchains he has enough info for the basic
> > > > flame graph, no?
> > > >
> > > > > And I feel like it'd be better to put the count before the callchains
> > > > > for consistency like below. Is it OK to you?
> > > >
> > > > Consistency with what?
> > >
> > > Oh, I meant consistency with other callchain output style like graph,
> > > fractal or flat - They all show the numbers before callchains. And I
> > > think it's easier to read for human. :)
> >
> > Well, As I said, isn't the main object here the callchain? :-)
> >
> > And Brendan's request is for a something to be consumed by scripts, i.e.
> > something like we have for perf stat:
> >
> > For humans:
> >
> > [root@felicio ~]# perf stat -e cycles -I 1000 -a
> > # time counts unit events
> > 1.000304391 1,820,038 cycles
> > 2.000490191 1,005,477,007 cycles
> > 3.000657813 1,717,007 cycles
> > ^C 3.917890293 2,804,034 cycles
> >
> > For machines/scripts:
> >
> > [root@felicio ~]# perf stat -x, -e cycles -I 1000 -a
> > 1.000291954,1923360,,cycles,3998167210,100.00
> > 2.000477154,1005608105,,cycles,3998475482,100.00
> > 3.000612612,1345483,,cycles,3998332391,100.00
> > 4.000744469,1005046913,,cycles,3998258199,100.00
> > ^C 4.331684347,1551327,,cycles,3463190970,100.00
> >
> > [root@felicio ~]#

> Yes, I thought about it too. Maybe -t/--field-separator option can be
> used to separate folded callchains too.

What I meant here was: for humans, we don't want a field separator, and
we want headers, we want alignment, etc, while for scripts, its better
something easily parseable and with a record per line, no alignment is
needed, etc.

> > > > The main thing here is the callchain, all the other stuff are things
> > > > related to it, so showing it first makes sense to me.
> > > >
> > > > Having some way to list the desired info to have for each callchain may
> > > > be interesting, and if he could do it like:
> > > >
> > > > -g folded,count,cpu,other,fields
> > > >
> > > > then he would know how to parse the per-callchain info at the end of
> > > > each line, right?
> > >
> > > Hmm.. looks like that it ends up having redundant info. I don't think
> >
> > What is redundant, and with with what?
>
> When it's used with normal perf report cases, those other info in
> callchain lines are redundant to hist lines. Also if a hist entry has

Sure, but if the user doesn't want to see the output of
hist_entry__snprintf()... :-)

> many callchains, each callchain lines will have same info in other fields.

Sure, but that would be what the script expects to consume, i.e. one
line per callchain.

> > > it's generally useful to other 'perf report' stuffs. Wouldn't it be
> > > better just adding minimal support and let the external tool parse the
> > > output?
> >
> > Oh well, perhaps we could have a 'perf callchain' tool that would be
> > centered on callchains and would provided one line per callchain, which
> > would have:
> >
> > callchain;seprarated;colons series,of,desired,fields,for,this,callchain
> >
> > Which would reuse heavily the 'perf report' / 'perf top' code for
> > histograms, no?

> I guess the callchain code is pretty isolated or can be isolated
> easily though.

> > I still think that this is a 'perf report' thing, but one that is
> > centered in callchains, and that is to be consumed by scripts, not
> > humans.

> Agreed.

> I'm just looking for a way to support it with minimal change. :)

Hey, me too. A --no-hists flag looks like a quickie, no need to isolate
callchain code, or anything like that, just one long option switch and
we get what we need.

- Arnaldo

2015-11-03 01:35:55

by Namhyung Kim

[permalink] [raw]
Subject: Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)

On Mon, Nov 02, 2015 at 09:46:47PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Nov 03, 2015 at 08:46:06AM +0900, Namhyung Kim escreveu:
> > On Mon, Nov 02, 2015 at 08:04:36PM -0300, Arnaldo Carvalho de Melo wrote:
> > > I still think that this is a 'perf report' thing, but one that is
> > > centered in callchains, and that is to be consumed by scripts, not
> > > humans.
>
> > Agreed.
>
> > I'm just looking for a way to support it with minimal change. :)
>
> Hey, me too. A --no-hists flag looks like a quickie, no need to isolate
> callchain code, or anything like that, just one long option switch and
> we get what we need.

Hmm.. okay. Let me think about the --no-hists flags then.

What do you want to do if the --no-hists flags is used without folded
callchain mode or other than --stdio?

And if you want to print other info in the callchains, what would be
the output of non-folded mode?

I think the simplest solution would be supporting the folded mode only
and error out other cases. Is it ok to you?

Thanks,
Namhyung

2015-11-03 01:46:10

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)

Em Tue, Nov 03, 2015 at 10:35:35AM +0900, Namhyung Kim escreveu:
> On Mon, Nov 02, 2015 at 09:46:47PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Nov 03, 2015 at 08:46:06AM +0900, Namhyung Kim escreveu:
> > > On Mon, Nov 02, 2015 at 08:04:36PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > I still think that this is a 'perf report' thing, but one that is
> > > > centered in callchains, and that is to be consumed by scripts, not
> > > > humans.
> >
> > > Agreed.
> >
> > > I'm just looking for a way to support it with minimal change. :)
> >
> > Hey, me too. A --no-hists flag looks like a quickie, no need to isolate
> > callchain code, or anything like that, just one long option switch and
> > we get what we need.
>
> Hmm.. okay. Let me think about the --no-hists flags then.
>
> What do you want to do if the --no-hists flags is used without folded
> callchain mode or other than --stdio?

What the user asked it to, to not show what hist_entry__snprintf()
produces, i.e. just the callchains.

Its left to the user to decide if that output is good for whatever
purpose it has in mind.

We, from this discussion, know that suppressing it when using with
folded callchains, is useful at least for Brendan's scripts :-)

> And if you want to print other info in the callchains, what would be
> the output of non-folded mode?

> I think the simplest solution would be supporting the folded mode only
> and error out other cases. Is it ok to you?

Well, the other info, if it comes at the end, may even be useful in non
folded mode, no?

If it is not, then the user will not use it, i.e. some combinations may
not produce useful results, but if we want to have more flexibility to
support usecases like Brendan's, and I think we want, without making the
existing code overly complex, then why not?

- Arnaldo

2015-11-03 03:17:16

by Namhyung Kim

[permalink] [raw]
Subject: Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)

On Mon, Nov 02, 2015 at 10:46:00PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Nov 03, 2015 at 10:35:35AM +0900, Namhyung Kim escreveu:
> > On Mon, Nov 02, 2015 at 09:46:47PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Tue, Nov 03, 2015 at 08:46:06AM +0900, Namhyung Kim escreveu:
> > > > On Mon, Nov 02, 2015 at 08:04:36PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > I still think that this is a 'perf report' thing, but one that is
> > > > > centered in callchains, and that is to be consumed by scripts, not
> > > > > humans.
> > >
> > > > Agreed.
> > >
> > > > I'm just looking for a way to support it with minimal change. :)
> > >
> > > Hey, me too. A --no-hists flag looks like a quickie, no need to isolate
> > > callchain code, or anything like that, just one long option switch and
> > > we get what we need.
> >
> > Hmm.. okay. Let me think about the --no-hists flags then.
> >
> > What do you want to do if the --no-hists flags is used without folded
> > callchain mode or other than --stdio?
>
> What the user asked it to, to not show what hist_entry__snprintf()
> produces, i.e. just the callchains.
>
> Its left to the user to decide if that output is good for whatever
> purpose it has in mind.

OK, will add it in a follow-up patch after checking TUI and GTK.

>
> We, from this discussion, know that suppressing it when using with
> folded callchains, is useful at least for Brendan's scripts :-)

OK

>
> > And if you want to print other info in the callchains, what would be
> > the output of non-folded mode?
>
> > I think the simplest solution would be supporting the folded mode only
> > and error out other cases. Is it ok to you?
>
> Well, the other info, if it comes at the end, may even be useful in non
> folded mode, no?

At the end? Brendan wanted to have it first and I think it'd be
better to show first.

Anyway, this other info depends on the sort keys - IOW it cannot show
task comm name if user gave sort keys without comm like '-s cpu'. So
how about adding 'info' or 'context' (or whatever name it) option to
-g/--call-graph to show info selected by sort keys.

For example,

$ perf report --no-children --stdio -s comm,dso -g folded,info --no-hists
28.63% swapper,[kernel.vmlinux] intel_idle;cpuidle_enter_state;...
11.30% swapper,[kernel.vmlinux] intel_idle;cpuidle_enter_state;...


$ perf report --no-children --stdio -s pid,sym -g info
...
39.93% swapper [k] intel_idle
<0:swapper,intel_idle>
|
|---intel_idel
cpuidle_enter_state
...

What do you think?

Thanks,
Namhyung


>
> If it is not, then the user will not use it, i.e. some combinations may
> not produce useful results, but if we want to have more flexibility to
> support usecases like Brendan's, and I think we want, without making the
> existing code overly complex, then why not?
>
> - Arnaldo