2020-12-10 23:07:20

by Alexander Antonov

[permalink] [raw]
Subject: [PATCH 0/5] perf stat: Introduce --iiostat mode to provide I/O performance metrics

Mode is intended to provide four I/O performance metrics in MB per each
IIO stack:
- Inbound Read: I/O devices below IIO stack read from the host memory
- Inbound Write: I/O devices below IIO stack write to the host memory
- Outbound Read: CPU reads from I/O devices below IIO stack
- Outbound Write: CPU writes to I/O devices below IIO stack

Each metric requiries only one IIO event which increments at every 4B
transfer in corresponding direction. The formulas to compute metrics
are generic:
#EventCount * 4B / (1024 * 1024)

Note: --iiostat introduces new perf data aggregation mode - per I/O stack
hence -e and -M options are not supported.

Usage examples:

1. List all IIO stacks (example for 2-S platform):
$ perf stat --iiostat=show
S0-uncore_iio_0<0000:00>
S1-uncore_iio_0<0000:80>
S0-uncore_iio_1<0000:17>
S1-uncore_iio_1<0000:85>
S0-uncore_iio_2<0000:3a>
S1-uncore_iio_2<0000:ae>
S0-uncore_iio_3<0000:5d>
S1-uncore_iio_3<0000:d7>

2. Collect metrics for all I/O stacks:
$ perf stat --iiostat -- dd if=/dev/zero of=/dev/nvme0n1 bs=1M oflag=direct
357708+0 records in
357707+0 records out
375083606016 bytes (375 GB, 349 GiB) copied, 215.974 s, 1.7 GB/s

Performance counter stats for 'system wide':

port Inbound Read(MB) Inbound Write(MB) Outbound Read(MB) Outbound Write(MB)
0000:00 1 0 2 3
0000:80 0 0 0 0
0000:17 352552 43 0 21
0000:85 0 0 0 0
0000:3a 3 0 0 0
0000:ae 0 0 0 0
0000:5d 0 0 0 0
0000:d7 0 0 0 0

3. Collect metrics for comma separated list of I/O stacks:
$ perf stat --iiostat=0000:17,0:3a -- dd if=/dev/zero of=/dev/nvme0n1 bs=1M oflag=direct
357708+0 records in
357707+0 records out
375083606016 bytes (375 GB, 349 GiB) copied, 197.08 s, 1.9 GB/s

Performance counter stats for 'system wide':

port Inbound Read(MB) Inbound Write(MB) Outbound Read(MB) Outbound Write(MB)
0000:17 358559 44 0 22
0000:3a 3 2 0 0

197.081983474 seconds time elapsed

Alexander Antonov (5):
perf stat: Add AGGR_IIO_STACK mode
perf evsel: Introduce an observed performance device
perf stat: Basic support for iiostat in perf stat
perf stat: Helper functions for IIO stacks list in iiostat mode
perf stat: Enable --iiostat mode for x86 platforms

tools/perf/Documentation/perf-stat.txt | 31 ++
tools/perf/arch/x86/util/Build | 1 +
tools/perf/arch/x86/util/iiostat.c | 460 ++++++++++++++++++
tools/perf/builtin-stat.c | 38 +-
tools/perf/util/evsel.h | 1 +
tools/perf/util/iiostat.h | 33 ++
.../scripting-engines/trace-event-python.c | 2 +-
tools/perf/util/stat-display.c | 51 +-
tools/perf/util/stat-shadow.c | 11 +-
tools/perf/util/stat.c | 3 +-
tools/perf/util/stat.h | 2 +
11 files changed, 625 insertions(+), 8 deletions(-)
create mode 100644 tools/perf/arch/x86/util/iiostat.c
create mode 100644 tools/perf/util/iiostat.h


base-commit: 644bf4b0f7acde641d3db200b4db66977e96c3bd
--
2.19.1


2020-12-14 17:48:14

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 0/5] perf stat: Introduce --iiostat mode to provide I/O performance metrics

Em Thu, Dec 10, 2020 at 12:03:35PM +0300, Alexander Antonov escreveu:
> Mode is intended to provide four I/O performance metrics in MB per each
> IIO stack:
> - Inbound Read: I/O devices below IIO stack read from the host memory
> - Inbound Write: I/O devices below IIO stack write to the host memory
> - Outbound Read: CPU reads from I/O devices below IIO stack
> - Outbound Write: CPU writes to I/O devices below IIO stack
>
> Each metric requiries only one IIO event which increments at every 4B
> transfer in corresponding direction. The formulas to compute metrics
> are generic:
> #EventCount * 4B / (1024 * 1024)
>
> Note: --iiostat introduces new perf data aggregation mode - per I/O stack
> hence -e and -M options are not supported.
>
> Usage examples:

My first thought was: Why not have a 'perf iiostat' subcommand?

You're reusing the aggregation code for 'perf stat' and for that I'd
love to have Ian, Andi, Jiri et all to look at how you implemented it,
but I think having a shorter way of using this would be interesting :-)

- Arnaldo

> 1. List all IIO stacks (example for 2-S platform):
> $ perf stat --iiostat=show

Would be:

$ perf iiostat show

> S0-uncore_iio_0<0000:00>
> S1-uncore_iio_0<0000:80>
> S0-uncore_iio_1<0000:17>
> S1-uncore_iio_1<0000:85>
> S0-uncore_iio_2<0000:3a>
> S1-uncore_iio_2<0000:ae>
> S0-uncore_iio_3<0000:5d>
> S1-uncore_iio_3<0000:d7>
>
> 2. Collect metrics for all I/O stacks:
> $ perf stat --iiostat -- dd if=/dev/zero of=/dev/nvme0n1 bs=1M oflag=direct

$ perf iiostat -- dd if=/dev/zero of=/dev/nvme0n1 bs=1M oflag=direct

> 357708+0 records in
> 357707+0 records out
> 375083606016 bytes (375 GB, 349 GiB) copied, 215.974 s, 1.7 GB/s
>
> Performance counter stats for 'system wide':
>
> port Inbound Read(MB) Inbound Write(MB) Outbound Read(MB) Outbound Write(MB)
> 0000:00 1 0 2 3
> 0000:80 0 0 0 0
> 0000:17 352552 43 0 21
> 0000:85 0 0 0 0
> 0000:3a 3 0 0 0
> 0000:ae 0 0 0 0
> 0000:5d 0 0 0 0
> 0000:d7 0 0 0 0
>
> 3. Collect metrics for comma separated list of I/O stacks:
> $ perf stat --iiostat=0000:17,0:3a -- dd if=/dev/zero of=/dev/nvme0n1 bs=1M oflag=direct

$ perf iiostat 0000:17,0:3a -- dd if=/dev/zero of=/dev/nvme0n1 bs=1M oflag=direct

> 357708+0 records in
> 357707+0 records out
> 375083606016 bytes (375 GB, 349 GiB) copied, 197.08 s, 1.9 GB/s
>
> Performance counter stats for 'system wide':
>
> port Inbound Read(MB) Inbound Write(MB) Outbound Read(MB) Outbound Write(MB)
> 0000:17 358559 44 0 22
> 0000:3a 3 2 0 0
>
> 197.081983474 seconds time elapsed
>
> Alexander Antonov (5):
> perf stat: Add AGGR_IIO_STACK mode
> perf evsel: Introduce an observed performance device
> perf stat: Basic support for iiostat in perf stat
> perf stat: Helper functions for IIO stacks list in iiostat mode
> perf stat: Enable --iiostat mode for x86 platforms
>
> tools/perf/Documentation/perf-stat.txt | 31 ++
> tools/perf/arch/x86/util/Build | 1 +
> tools/perf/arch/x86/util/iiostat.c | 460 ++++++++++++++++++
> tools/perf/builtin-stat.c | 38 +-
> tools/perf/util/evsel.h | 1 +
> tools/perf/util/iiostat.h | 33 ++
> .../scripting-engines/trace-event-python.c | 2 +-
> tools/perf/util/stat-display.c | 51 +-
> tools/perf/util/stat-shadow.c | 11 +-
> tools/perf/util/stat.c | 3 +-
> tools/perf/util/stat.h | 2 +
> 11 files changed, 625 insertions(+), 8 deletions(-)
> create mode 100644 tools/perf/arch/x86/util/iiostat.c
> create mode 100644 tools/perf/util/iiostat.h
>
>
> base-commit: 644bf4b0f7acde641d3db200b4db66977e96c3bd
> --
> 2.19.1
>

--

- Arnaldo

2020-12-15 03:09:52

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 0/5] perf stat: Introduce --iiostat mode to provide I/O performance metrics

> My first thought was: Why not have a 'perf iiostat' subcommand?

Same would apply to a lot of options in perf stat.

I guess you could add some aliases to "perf" that give shortcuts
for common perf stat command lines.

-Andi

2020-12-15 14:01:53

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 0/5] perf stat: Introduce --iiostat mode to provide I/O performance metrics

Em Mon, Dec 14, 2020 at 07:04:30PM -0800, Andi Kleen escreveu:
> > My first thought was: Why not have a 'perf iiostat' subcommand?

> Same would apply to a lot of options in perf stat.

> I guess you could add some aliases to "perf" that give shortcuts
> for common perf stat command lines.

Yeah, and we have a mechanism for that, that was exercised only in the
'perf archive' case:

~/libexec/perf-core/perf-archive

I tried this and it works:

[root@five ~]# ls -la ~/bin/perf
lrwxrwxrwx. 1 root root 19 Feb 18 2020 /root/bin/perf -> /home/acme/bin/perf
[root@five ~]# vim ~acme/libexec/perf-core/perf-cgtop
[root@five ~]# chmod +x ~acme/libexec/perf-core/perf-cgtop
[root@five ~]# cat ~acme/libexec/perf-core/perf-cgtop
perf top --hierarchy --all-cgroups -s cgroup,dso,sym $*
[root@five ~]# perf cgtop
[root@five ~]#

use 'e' to expand collapse the current level (+ -> -), 'E'/'C' to
expand/collapse all levels.

'perf help' doesn't show it, which is a shame, I'll add support for it
to traverse ~/libexec/perf-core/perf-* and get the first non interpreter
comment line as a description for the command, so to add a new one is
just a matter of dropping a shell + man page, no need to change the perf
binary.


To test that '$*' at the end:

[root@five ~]# perf cgtop -U

I.e.:

[acme@five perf]$ perf top -h -U

Usage: perf top [<options>]

-U, --hide_user_symbols
hide user symbols

[acme@five perf]$

And it works, just kernel level samples grouped in an hierarchy, first
cgroup, then dso, then the symbol.

Also, using this with the 'P' hotkey:

[root@five ~]# perf cgtop --percent-limit 1

Shows how it looks like:

[root@five ~]# cat perf.hist.0
- 86.77% /user.slice/user-1000.slice/session-2.scope
- 36.18% [kernel]
2.24% [k] unmap_page_range
1.15% [k] clear_page_rep
1.10% [k] add_mm_counter_fast
1.03% [k] alloc_set_pte
1.03% [k] handle_mm_fault
- 17.65% libc-2.32.so
2.04% [.] _int_malloc
1.82% [.] __memmove_avx_unaligned_erms
1.48% [.] __strlen_avx2
1.13% [.] _int_free
1.12% [.] malloc
- 8.09% make
1.65% [.] jhash_string
1.05% [.] hash_find_slot
- 6.90% ld-2.32.so
2.03% [.] do_lookup_x
1.49% [.] _dl_lookup_symbol_x
- 4.78% cc1
- 4.60% libperl.so.5.32.0
- 2.86% bash
- 1.98% libselinux.so.1
- 1.61% libpython2.7.so.1.0
- 1.06% libpcre2-8.so.0.10.0
- 9.17% /user.slice/user-1000.slice/session-4.scope
- 4.66% perf
- 2.40% libc-2.32.so
- 1.82% [kernel]
- 4.04% /
- 4.02% [kernel]
[root@five ~]#

So 'perf iiostat' would become:

[root@five ~]# cat ~acme/libexec/perf-core/perf-iiostat
perf stat --iiostat $*
[root@five ~]#

There are parameters to that '--iiostat' in the current patchset that
may complicates this tho, with some changes I guess we get what we want.

- Arnaldo

2020-12-20 17:27:04

by Alexander Antonov

[permalink] [raw]
Subject: Re: [PATCH 0/5] perf stat: Introduce --iiostat mode to provide I/O performance metrics

On 12/15/2020 4:58 PM, Arnaldo Carvalho de Melo wrote:
> Em Mon, Dec 14, 2020 at 07:04:30PM -0800, Andi Kleen escreveu:
>>> My first thought was: Why not have a 'perf iiostat' subcommand?
>
>> Same would apply to a lot of options in perf stat.
>
>> I guess you could add some aliases to "perf" that give shortcuts
>> for common perf stat command lines.
> Yeah, and we have a mechanism for that, that was exercised only in the
> 'perf archive' case:
>
> ~/libexec/perf-core/perf-archive
>
> I tried this and it works:
>
> [root@five ~]# ls -la ~/bin/perf
> lrwxrwxrwx. 1 root root 19 Feb 18 2020 /root/bin/perf -> /home/acme/bin/perf
> [root@five ~]# vim ~acme/libexec/perf-core/perf-cgtop
> [root@five ~]# chmod +x ~acme/libexec/perf-core/perf-cgtop
> [root@five ~]# cat ~acme/libexec/perf-core/perf-cgtop
> perf top --hierarchy --all-cgroups -s cgroup,dso,sym $*
> [root@five ~]# perf cgtop
> [root@five ~]#
>
> use 'e' to expand collapse the current level (+ -> -), 'E'/'C' to
> expand/collapse all levels.
>
> 'perf help' doesn't show it, which is a shame, I'll add support for it
> to traverse ~/libexec/perf-core/perf-* and get the first non interpreter
> comment line as a description for the command, so to add a new one is
> just a matter of dropping a shell + man page, no need to change the perf
> binary.
>
>
> To test that '$*' at the end:
>
> [root@five ~]# perf cgtop -U
>
> I.e.:
>
> [acme@five perf]$ perf top -h -U
>
> Usage: perf top [<options>]
>
> -U, --hide_user_symbols
> hide user symbols
>
> [acme@five perf]$
>
> And it works, just kernel level samples grouped in an hierarchy, first
> cgroup, then dso, then the symbol.
>
> Also, using this with the 'P' hotkey:
>
> [root@five ~]# perf cgtop --percent-limit 1
>
> Shows how it looks like:
>
> [root@five ~]# cat perf.hist.0
> - 86.77% /user.slice/user-1000.slice/session-2.scope
> - 36.18% [kernel]
> 2.24% [k] unmap_page_range
> 1.15% [k] clear_page_rep
> 1.10% [k] add_mm_counter_fast
> 1.03% [k] alloc_set_pte
> 1.03% [k] handle_mm_fault
> - 17.65% libc-2.32.so
> 2.04% [.] _int_malloc
> 1.82% [.] __memmove_avx_unaligned_erms
> 1.48% [.] __strlen_avx2
> 1.13% [.] _int_free
> 1.12% [.] malloc
> - 8.09% make
> 1.65% [.] jhash_string
> 1.05% [.] hash_find_slot
> - 6.90% ld-2.32.so
> 2.03% [.] do_lookup_x
> 1.49% [.] _dl_lookup_symbol_x
> - 4.78% cc1
> - 4.60% libperl.so.5.32.0
> - 2.86% bash
> - 1.98% libselinux.so.1
> - 1.61% libpython2.7.so.1.0
> - 1.06% libpcre2-8.so.0.10.0
> - 9.17% /user.slice/user-1000.slice/session-4.scope
> - 4.66% perf
> - 2.40% libc-2.32.so
> - 1.82% [kernel]
> - 4.04% /
> - 4.02% [kernel]
> [root@five ~]#
>
> So 'perf iiostat' would become:
>
> [root@five ~]# cat ~acme/libexec/perf-core/perf-iiostat
> perf stat --iiostat $*
> [root@five ~]#
>
> There are parameters to that '--iiostat' in the current patchset that
> may complicates this tho, with some changes I guess we get what we want.
>
> - Arnaldo

Hello Arnaldo,
Sorry for delayed response.

This is the interesting approach to get shorter command. Thank you for the
explanation. I will update the patchset.

- Alexander