2023-10-26 06:26:23

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH] perf tools: Add -H short option for --hierarchy

I found the hierarchy mode useful, but it's easy to make a typo when
using it. Let's add a short option for that.

Also update the documentation. :)

Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/Documentation/perf-report.txt | 29 ++++++++++++++++++++-
tools/perf/Documentation/perf-top.txt | 32 +++++++++++++++++++++++-
tools/perf/builtin-report.c | 2 +-
tools/perf/builtin-top.c | 2 +-
4 files changed, 61 insertions(+), 4 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index af068b4f1e5a..7d8916b2b7f7 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -528,8 +528,35 @@ include::itrace.txt[]
--raw-trace::
When displaying traceevent output, do not use print fmt or plugins.

+-H::
--hierarchy::
- Enable hierarchical output.
+ Enable hierarchical output. In the hierarchy mode, each sort key groups
+ samples based on the criteria and then sub-divide it using the lower
+ level sort key.
+
+ For example:
+ In normal output:
+
+ perf report -s dso,sym
+ # Overhead Shared Object Symbol
+ 50.00% [kernel.kallsyms] [k] kfunc1
+ 20.00% perf [.] foo
+ 15.00% [kernel.kallsyms] [k] kfunc2
+ 10.00% perf [.] bar
+ 5.00% libc.so [.] libcall
+
+ In hierarchy output:
+
+ perf report -s dso,sym --hierarchy
+ # Overhead Shared Object / Symbol
+ 65.00% [kernel.kallsyms]
+ 50.00% [k] kfunc1
+ 15.00% [k] kfunc2
+ 30.00% perf
+ 20.00% [.] foo
+ 10.00% [.] bar
+ 5.00% libc.so
+ 5.00% [.] libcall

--inline::
If a callgraph address belongs to an inlined function, the inline stack
diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
index 3c202ec080ba..a754875fa5bb 100644
--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -261,8 +261,38 @@ Default is to monitor all CPUS.
--raw-trace::
When displaying traceevent output, do not use print fmt or plugins.

+-H::
--hierarchy::
- Enable hierarchy output.
+ Enable hierarchical output. In the hierarchy mode, each sort key groups
+ samples based on the criteria and then sub-divide it using the lower
+ level sort key.
+
+ For example, in normal output:
+
+ perf report -s dso,sym
+ #
+ # Overhead Shared Object Symbol
+ # ........ ................. ...........
+ 50.00% [kernel.kallsyms] [k] kfunc1
+ 20.00% perf [.] foo
+ 15.00% [kernel.kallsyms] [k] kfunc2
+ 10.00% perf [.] bar
+ 5.00% libc.so [.] libcall
+
+ In hierarchy output:
+
+ perf report -s dso,sym --hierarchy
+ #
+ # Overhead Shared Object / Symbol
+ # .......... ......................
+ 65.00% [kernel.kallsyms]
+ 50.00% [k] kfunc1
+ 15.00% [k] kfunc2
+ 30.00% perf
+ 20.00% [.] foo
+ 10.00% [.] bar
+ 5.00% libc.so
+ 5.00% [.] libcall

--overwrite::
Enable this to use just the most recent records, which helps in high core count
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index ca8f2331795c..b16680d0f82c 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1392,7 +1392,7 @@ int cmd_report(int argc, const char **argv)
"only show processor socket that match with this filter"),
OPT_BOOLEAN(0, "raw-trace", &symbol_conf.raw_trace,
"Show raw trace event output (do not use print fmt or plugins)"),
- OPT_BOOLEAN(0, "hierarchy", &symbol_conf.report_hierarchy,
+ OPT_BOOLEAN('H', "hierarchy", &symbol_conf.report_hierarchy,
"Show entries in a hierarchy"),
OPT_CALLBACK_DEFAULT(0, "stdio-color", NULL, "mode",
"'always' (default), 'never' or 'auto' only applicable to --stdio mode",
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index ea8c7eca5eee..3cccb2a516dc 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1573,7 +1573,7 @@ int cmd_top(int argc, const char **argv)
"add last branch records to call history"),
OPT_BOOLEAN(0, "raw-trace", &symbol_conf.raw_trace,
"Show raw trace event output (do not use print fmt or plugins)"),
- OPT_BOOLEAN(0, "hierarchy", &symbol_conf.report_hierarchy,
+ OPT_BOOLEAN('H', "hierarchy", &symbol_conf.report_hierarchy,
"Show entries in a hierarchy"),
OPT_BOOLEAN(0, "overwrite", &top.record_opts.overwrite,
"Use a backward ring buffer, default: no"),
--
2.42.0.758.gaed0368e0e-goog


2023-10-26 06:46:26

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH] perf tools: Add -H short option for --hierarchy

On 26/10/23 09:26, Namhyung Kim wrote:
> I found the hierarchy mode useful, but it's easy to make a typo when
> using it. Let's add a short option for that.
>
> Also update the documentation. :)

Perhaps it would also be possible to support bash-completions for
long options

2023-10-26 17:19:51

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH] perf tools: Add -H short option for --hierarchy

Hi Adrian,

On Wed, Oct 25, 2023 at 11:46 PM Adrian Hunter <[email protected]> wrote:
>
> On 26/10/23 09:26, Namhyung Kim wrote:
> > I found the hierarchy mode useful, but it's easy to make a typo when
> > using it. Let's add a short option for that.
> >
> > Also update the documentation. :)
>
> Perhaps it would also be possible to support bash-completions for
> long options

I believe it already supports long options. But I have some setup
which doesn't work with bash completions. :-(

Thanks,
Namhyung

2023-10-26 20:02:25

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH] perf tools: Add -H short option for --hierarchy

Em Thu, Oct 26, 2023 at 09:46:02AM +0300, Adrian Hunter escreveu:
> On 26/10/23 09:26, Namhyung Kim wrote:
> > I found the hierarchy mode useful, but it's easy to make a typo when
> > using it. Let's add a short option for that.

> > Also update the documentation. :)

> Perhaps it would also be possible to support bash-completions for
> long options

It works:

# . ~acme/git/linux/tools/perf/perf-completion.sh
# perf top --hi<TAB>
--hide_kernel_symbols --hide_user_symbols --hierarchy
#

And:

perf top --hie<ENTER>

works as it is unambiguous (so far).

What we don't have is a way to use hierachy by default, i.e. we should
have:

perf config top.hierarchy=1

and then:

perf top

would always use the hierarchy view.

tools/perf/Documentation/perf-config.txt has the options that can be
set, like:

# perf report | head -15
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 373K of event 'cycles:P'
# Event count (approx.): 205365133495
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ...................................
#
3.17% MediaDe~hine #6 libc.so.6 [.] pthread_mutex_lock@@GLIBC_2.2.5
2.31% swapper [kernel.vmlinux] [k] psi_group_change
1.87% MediaSu~sor #10 libc.so.6 [.] pthread_mutex_lock@@GLIBC_2.2.5
1.84% MediaSu~isor #7 libc.so.6 [.] pthread_mutex_lock@@GLIBC_2.2.5
#

Then:

# perf config report.sort_order=dso
# perf report | head -15
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 373K of event 'cycles:P'
# Event count (approx.): 205365133495
#
# Overhead Shared Object
# ........ ..............................................
#
59.52% [kernel.vmlinux]
19.79% libc.so.6
8.07% libxul.so
5.25% libopenh264.so.2.3.1
#

# cat ~/.perfconfig
# this file is auto-generated.
[report]
sort_order = dso
[root@five ~]# perf config report.sort_order
report.sort_order=dso
#

Right now 'perf top' has only:

static int perf_top_config(const char *var, const char *value, void *cb __maybe_unused)
{
if (!strcmp(var, "top.call-graph")) {
var = "call-graph.record-mode";
return perf_default_config(var, value, cb);
}
if (!strcmp(var, "top.children")) {
symbol_conf.cumulate_callchain = perf_config_bool(var, value);
return 0;
}

return 0;
}

This would be similar to what was done for --no-children on:

https://git.kernel.org/torvalds/c/104ac991bd821773cba6f262f97a4a752ed76dd5

$ git show --pretty=full 104ac991bd821773cba6f262f97a4a752ed76dd5 | head -5
commit 104ac991bd821773cba6f262f97a4a752ed76dd5
Author: Namhyung Kim <[email protected]>
Commit: Jiri Olsa <[email protected]>

perf top: Add top.children config option

- Arnaldo

2023-11-06 04:44:16

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH] perf tools: Add -H short option for --hierarchy

Hi Arnaldo,

On Thu, Oct 26, 2023 at 1:02 PM Arnaldo Carvalho de Melo
<[email protected]> wrote:
>
> Em Thu, Oct 26, 2023 at 09:46:02AM +0300, Adrian Hunter escreveu:
> > On 26/10/23 09:26, Namhyung Kim wrote:
> > > I found the hierarchy mode useful, but it's easy to make a typo when
> > > using it. Let's add a short option for that.
>
> > > Also update the documentation. :)
>
> > Perhaps it would also be possible to support bash-completions for
> > long options
>
> It works:
>
> # . ~acme/git/linux/tools/perf/perf-completion.sh
> # perf top --hi<TAB>
> --hide_kernel_symbols --hide_user_symbols --hierarchy
> #
>
> And:
>
> perf top --hie<ENTER>
>
> works as it is unambiguous (so far).

Thanks for the test!

>
> What we don't have is a way to use hierachy by default, i.e. we should
> have:
>
> perf config top.hierarchy=1
>
> and then:
>
> perf top
>
> would always use the hierarchy view.
>
> tools/perf/Documentation/perf-config.txt has the options that can be
> set, like:
>
> # perf report | head -15
> # To display the perf.data header info, please use --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 373K of event 'cycles:P'
> # Event count (approx.): 205365133495
> #
> # Overhead Command Shared Object Symbol
> # ........ ............... ................. ...................................
> #
> 3.17% MediaDe~hine #6 libc.so.6 [.] pthread_mutex_lock@@GLIBC_2.2.5
> 2.31% swapper [kernel.vmlinux] [k] psi_group_change
> 1.87% MediaSu~sor #10 libc.so.6 [.] pthread_mutex_lock@@GLIBC_2.2.5
> 1.84% MediaSu~isor #7 libc.so.6 [.] pthread_mutex_lock@@GLIBC_2.2.5
> #
>
> Then:
>
> # perf config report.sort_order=dso
> # perf report | head -15
> # To display the perf.data header info, please use --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 373K of event 'cycles:P'
> # Event count (approx.): 205365133495
> #
> # Overhead Shared Object
> # ........ ..............................................
> #
> 59.52% [kernel.vmlinux]
> 19.79% libc.so.6
> 8.07% libxul.so
> 5.25% libopenh264.so.2.3.1
> #
>
> # cat ~/.perfconfig
> # this file is auto-generated.
> [report]
> sort_order = dso
> [root@five ~]# perf config report.sort_order
> report.sort_order=dso
> #
>
> Right now 'perf top' has only:
>
> static int perf_top_config(const char *var, const char *value, void *cb __maybe_unused)
> {
> if (!strcmp(var, "top.call-graph")) {
> var = "call-graph.record-mode";
> return perf_default_config(var, value, cb);
> }
> if (!strcmp(var, "top.children")) {
> symbol_conf.cumulate_callchain = perf_config_bool(var, value);
> return 0;
> }
>
> return 0;
> }
>
> This would be similar to what was done for --no-children on:

Sure, I can add the config option later. But it's not
compatible with some options that change the output
like --children and --fields. Maybe it needs to handle
some kind of priority of settings for incompatible one.

Thanks,
Namhyung

>
> https://git.kernel.org/torvalds/c/104ac991bd821773cba6f262f97a4a752ed76dd5
>
> $ git show --pretty=full 104ac991bd821773cba6f262f97a4a752ed76dd5 | head -5
> commit 104ac991bd821773cba6f262f97a4a752ed76dd5
> Author: Namhyung Kim <[email protected]>
> Commit: Jiri Olsa <[email protected]>
>
> perf top: Add top.children config option
>
> - Arnaldo