2015-05-22 14:39:12

by Martin Liška

[permalink] [raw]
Subject: [RFC] Add --show-total-period for perf annotate

Hello.

I've been working on a new feature for perf annotate, which should be able to annotate
instructions with total spent time (compared to percentage usage).

Let's consider following use-case. You want to compare two different compilers
on the same code base and let's assume 90% of wall-time is spent in a single function.
Moreover, let's say that these compilers produce assembly of a totally different size.

In such case, it's very useful to get an approximation of spent time on a bunch of instructions,
which can be compared among other compilers. Otherwise, one has to somehow sum percentages and compare
it to size of a function.

Mail contains my experimental patch.

However, last missing thing is that I need to calculate portion of cycles a function
utilizes. May I ask you for help as perf annotate does not count these portions.

Thanks,
Martin


Attachments:
0001-First-semi-working-version.patch (5.89 kB)

2015-05-23 04:08:40

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC] Add --show-total-period for perf annotate

Martin Liška <[email protected]> writes:

> I've been working on a new feature for perf annotate, which should be able to annotate
> instructions with total spent time (compared to percentage usage).
>
> Let's consider following use-case. You want to compare two different compilers
> on the same code base and let's assume 90% of wall-time is spent in a single function.
> Moreover, let's say that these compilers produce assembly of a totally different size.
>
> In such case, it's very useful to get an approximation of spent time on a bunch of instructions,
> which can be compared among other compilers. Otherwise, one has to somehow sum percentages and compare
> it to size of a function.

perf diff does not handle this? Especially with the differential
profiling options it should.

>> @@ -623,6 +624,8 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
> if (!target__none(&opts->target) && !opts->initial_delay)
> perf_evlist__enable(rec->evlist);
>
> + t0 = rdclock();
> +
> /*
> * Let the child rip
> */
> @@ -692,6 +695,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
> goto out_child;
> }
>
> + t1 = rdclock();
> + walltime_nsecs = t1 - t0;

The walltime can be later computed by the difference of the first and
the last time stamp after sorting the events. So you don't need the new header.

-Andi

--
[email protected] -- Speaking for myself only

2015-05-25 07:46:11

by Martin Liška

[permalink] [raw]
Subject: Re: [RFC] Add --show-total-period for perf annotate

On 05/23/2015 06:08 AM, Andi Kleen wrote:
> Martin Liška <[email protected]> writes:
>
>> I've been working on a new feature for perf annotate, which should be able to annotate
>> instructions with total spent time (compared to percentage usage).
>>
>> Let's consider following use-case. You want to compare two different compilers
>> on the same code base and let's assume 90% of wall-time is spent in a single function.
>> Moreover, let's say that these compilers produce assembly of a totally different size.
>>
>> In such case, it's very useful to get an approximation of spent time on a bunch of instructions,
>> which can be compared among other compilers. Otherwise, one has to somehow sum percentages and compare
>> it to size of a function.
>
> perf diff does not handle this? Especially with the differential
> profiling options it should.

It does not work if you, in my case, compare ICC and GCC, where ICC uses a different mangling
scheme for fortran modules. Moreover, situation can be more complicated if a compiler performs
a bit different inlining decisions.

>
>>> @@ -623,6 +624,8 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>> if (!target__none(&opts->target) && !opts->initial_delay)
>> perf_evlist__enable(rec->evlist);
>>
>> + t0 = rdclock();
>> +
>> /*
>> * Let the child rip
>> */
>> @@ -692,6 +695,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>> goto out_child;
>> }
>>
>> + t1 = rdclock();
>> + walltime_nsecs = t1 - t0;
>
> The walltime can be later computed by the difference of the first and
> the last time stamp after sorting the events. So you don't need the new header.
>
> -Andi
>

Good point. Can you please help me how to compute a function percentage usage in perf annotate ;) ?

Thanks,
Martin

2015-05-25 15:14:54

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC] Add --show-total-period for perf annotate

> >perf diff does not handle this? Especially with the differential
> >profiling options it should.
>
> It does not work if you, in my case, compare ICC and GCC, where ICC uses a different mangling
> scheme for fortran modules. Moreover, situation can be more complicated if a compiler performs
> a bit different inlining decisions.

I suppose it could be enhanced with an input file that describes
equivalent functions. But yes wouldn't work for inlining.

> Good point. Can you please help me how to compute a function percentage usage in perf annotate ;) ?

I wouldn't use time at all. Just sum up periods and then compute the
percentage. The period sum computation already happens in the main view, and
is displayed there. So you only need to save that value somewhere and
then use it in the annotate display for another column.

# Samples: 24 of event 'cycles'
# Event count (approx.): 8856637

-Andi

--
[email protected] -- Speaking for myself only.

2015-05-26 13:15:53

by Martin Liška

[permalink] [raw]
Subject: Re: [RFC] Add --show-total-period for perf annotate

On 05/25/2015 05:14 PM, Andi Kleen wrote:
>>> perf diff does not handle this? Especially with the differential
>>> profiling options it should.
>>
>> It does not work if you, in my case, compare ICC and GCC, where ICC uses a different mangling
>> scheme for fortran modules. Moreover, situation can be more complicated if a compiler performs
>> a bit different inlining decisions.
>
> I suppose it could be enhanced with an input file that describes
> equivalent functions. But yes wouldn't work for inlining.
>
>> Good point. Can you please help me how to compute a function percentage usage in perf annotate ;) ?
>
> I wouldn't use time at all. Just sum up periods and then compute the
> percentage. The period sum computation already happens in the main view, and
> is displayed there. So you only need to save that value somewhere and
> then use it in the annotate display for another column.
>
> # Samples: 24 of event 'cycles'
> # Event count (approx.): 8856637
>
> -Andi
>

Hello.

Are you talking about summing cycles and compute global percentage for each isntruction?

Anyway, attached patch is capable of displaying milliseconds approximation for each instruction.

Example:

time ./perf record ./a.out
[ perf record: Woken up 7 times to write data ]
[ perf record: Captured and wrote 1.743 MB perf.data (45386 samples) ]

real 0m11.465s
user 0m11.424s
sys 0m0.045s

$ perf report --stdio

32.69% a.out libc-2.19.so [.] __random_r
26.66% a.out libc-2.19.so [.] __random
12.55% a.out a.out [.] foo
10.14% a.out a.out [.] bar
9.27% a.out a.out [.] baz
7.22% a.out libc-2.19.so [.] rand
1.37% a.out a.out [.] rand@plt

$ perf annotate --stdio

Percent | Source code & Disassembly of libc-2.19.so for cycles
--------------------------------------------------------------------
:
:
:
: Disassembly of section .text:
:
: 0000000000038890 <random_r>:
: __random_r():
11.12 : 38890: test %rdi,%rdi


$ perf annotate --stdio --show-total-period
Percent | Source code & Disassembly of libc-2.19.so for cycles
--------------------------------------------------------------------
:
:
:
: Disassembly of section .text:
:
: 0000000000038890 <random_r>:
: __random_r():
413 : 38890: test %rdi,%rdi

First `test` instruction: 11465*0.3269*0.1112 ~ 413ms.

Thanks for suggestions,
Martin



Attachments:
0001-First-semi-working-version.patch (5.89 kB)

2015-05-26 17:03:22

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC] Add --show-total-period for perf annotate

> Anyway, attached patch is capable of displaying milliseconds approximation for each instruction.

You realize that the events perf is not counting do not directly map to
wall time? Even if you count cycles, the cycles are either stopping in idle
or changing unit as the CPU's frequencies change. For other events the
relationship is even more remote, think what happens when counting cache or
TLB misses.

Also even if it was mapping to time somehow, it's just a hit, not a
duration, so it cannot say how long a individual instruction took.

So you cannot map a sample event to time.

To do what you want you would need to use something like processor
trace, which can do exact accounting.

I think the only thing that makes sense is to account it relative to
the event counts.

-Andi

2015-06-01 11:08:24

by Martin Liška

[permalink] [raw]
Subject: Re: [RFC] Add --show-total-period for perf annotate

On 05/29/2015 05:55 PM, Andi Kleen wrote:
> Martin Liška <[email protected]> writes:
>>
>> Following patch sums samples that belong to a line in assembly language.
>> What do you think about it, would it be acceptable solution?
>
> Basic patch looks good to me now. Thanks.
>
> The only ugly thing is the global variable, perhaps that could be
> cleaned up.
>
> -Andi
>

Hello.

You are right, the global variable is not needed anymore as we do not
count any fraction related to samples related to a line in assembly language.

Please look at attached patch.

Thank you,
Martin


Attachments:
0001-perf-annotate-With-show-total-period-display-total-o-v2.patch (7.95 kB)