2017-06-28 03:18:22

by Taeung Song

[permalink] [raw]
Subject: [PATCH/RFC 0/4] perf annotate: Add --source-only option and the new source code TUI view

Hi,

The --source-only option and new source code TUI view can show
the result of performance analysis based on full source code per symbol(function).
(Namhyung Kim told me this idea and it was also requested by others some time ago..)

If someone wants to see the cause, he/she will need to dig into the asm.
But before that, looking at the source level can give a hint or clue
for the problem.

For example, if target symbol is 'hex2u64' of util/util.c,
the output is like below.

$ perf annotate --source-only --stdio -s hex2u64
Percent | Source code of util.c for cycles:ppp (42 samples)
-----------------------------------------------------------------
0.00 : 354 * While we find nice hex chars, build a long_val.
0.00 : 355 * Return number of chars processed.
0.00 : 356 */
0.00 : 357 int hex2u64(const char *ptr, u64 *long_val)
2.38 : 358 {
2.38 : 359 const char *p = ptr;
0.00 : 360 *long_val = 0;
0.00 : 361
30.95 : 362 while (*p) {
23.81 : 363 const int hex_val = hex(*p);
0.00 : 364
14.29 : 365 if (hex_val < 0)
0.00 : 366 break;
0.00 : 367
26.19 : 368 *long_val = (*long_val << 4) | hex_val;
0.00 : 369 p++;
0.00 : 370 }
0.00 : 371
0.00 : 372 return p - ptr;
0.00 : 373 }

And I added many perf developers into Cc: because I want to listen to your opinions
about this new feature, if you don't mind.

If you give some feedback, I'd appreciate it! :)

Thanks,
Taeung

Taeung Song (4):
perf annotate: Add --source-only option
perf annotate: Add new source code view to the annotate TUI browser
perf annotate: Fold or unfold partial disassembly lines on source code
view
perf annotate: Support a 'o' key showing addresses on the new source
code view

tools/perf/builtin-annotate.c | 2 +
tools/perf/ui/browser.h | 1 +
tools/perf/ui/browsers/annotate.c | 307 +++++++++++++++++++++++++++++++++++++-
tools/perf/util/annotate.c | 303 ++++++++++++++++++++++++++++++++++++-
tools/perf/util/annotate.h | 30 ++++
tools/perf/util/symbol.c | 1 +
tools/perf/util/symbol.h | 1 +
7 files changed, 633 insertions(+), 12 deletions(-)

--
2.7.4


2017-06-28 03:26:56

by Taeung Song

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/4] perf annotate: Add --source-only option and the new source code TUI view

Additionally,
the code is available on 'annotate/src-only' branch at

git://github.com/taeung/linux-perf.git

Thanks,
Taeung

On 06/28/2017 12:18 PM, Taeung Song wrote:
> Hi,
>
> The --source-only option and new source code TUI view can show
> the result of performance analysis based on full source code per symbol(function).
> (Namhyung Kim told me this idea and it was also requested by others some time ago..)
>
> If someone wants to see the cause, he/she will need to dig into the asm.
> But before that, looking at the source level can give a hint or clue
> for the problem.
>
> For example, if target symbol is 'hex2u64' of util/util.c,
> the output is like below.
>
> $ perf annotate --source-only --stdio -s hex2u64
> Percent | Source code of util.c for cycles:ppp (42 samples)
> -----------------------------------------------------------------
> 0.00 : 354 * While we find nice hex chars, build a long_val.
> 0.00 : 355 * Return number of chars processed.
> 0.00 : 356 */
> 0.00 : 357 int hex2u64(const char *ptr, u64 *long_val)
> 2.38 : 358 {
> 2.38 : 359 const char *p = ptr;
> 0.00 : 360 *long_val = 0;
> 0.00 : 361
> 30.95 : 362 while (*p) {
> 23.81 : 363 const int hex_val = hex(*p);
> 0.00 : 364
> 14.29 : 365 if (hex_val < 0)
> 0.00 : 366 break;
> 0.00 : 367
> 26.19 : 368 *long_val = (*long_val << 4) | hex_val;
> 0.00 : 369 p++;
> 0.00 : 370 }
> 0.00 : 371
> 0.00 : 372 return p - ptr;
> 0.00 : 373 }
>
> And I added many perf developers into Cc: because I want to listen to your opinions
> about this new feature, if you don't mind.
>
> If you give some feedback, I'd appreciate it! :)
>
> Thanks,
> Taeung
>
> Taeung Song (4):
> perf annotate: Add --source-only option
> perf annotate: Add new source code view to the annotate TUI browser
> perf annotate: Fold or unfold partial disassembly lines on source code
> view
> perf annotate: Support a 'o' key showing addresses on the new source
> code view
>
> tools/perf/builtin-annotate.c | 2 +
> tools/perf/ui/browser.h | 1 +
> tools/perf/ui/browsers/annotate.c | 307 +++++++++++++++++++++++++++++++++++++-
> tools/perf/util/annotate.c | 303 ++++++++++++++++++++++++++++++++++++-
> tools/perf/util/annotate.h | 30 ++++
> tools/perf/util/symbol.c | 1 +
> tools/perf/util/symbol.h | 1 +
> 7 files changed, 633 insertions(+), 12 deletions(-)
>

2017-06-28 09:53:33

by Milian Wolff

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/4] perf annotate: Add --source-only option and the new source code TUI view

On Wednesday, June 28, 2017 5:18:08 AM CEST Taeung Song wrote:
> Hi,
>
> The --source-only option and new source code TUI view can show
> the result of performance analysis based on full source code per
> symbol(function). (Namhyung Kim told me this idea and it was also requested
> by others some time ago..)
>
> If someone wants to see the cause, he/she will need to dig into the asm.
> But before that, looking at the source level can give a hint or clue
> for the problem.
>
> For example, if target symbol is 'hex2u64' of util/util.c,
> the output is like below.
>
> $ perf annotate --source-only --stdio -s hex2u64
> Percent | Source code of util.c for cycles:ppp (42 samples)
> -----------------------------------------------------------------
> 0.00 : 354 * While we find nice hex chars, build a long_val.
> 0.00 : 355 * Return number of chars processed.
> 0.00 : 356 */
> 0.00 : 357 int hex2u64(const char *ptr, u64 *long_val)
> 2.38 : 358 {
> 2.38 : 359 const char *p = ptr;
> 0.00 : 360 *long_val = 0;
> 0.00 : 361
> 30.95 : 362 while (*p) {
> 23.81 : 363 const int hex_val = hex(*p);
> 0.00 : 364
> 14.29 : 365 if (hex_val < 0)
> 0.00 : 366 break;
> 0.00 : 367
> 26.19 : 368 *long_val = (*long_val << 4) | hex_val;
> 0.00 : 369 p++;
> 0.00 : 370 }
> 0.00 : 371
> 0.00 : 372 return p - ptr;
> 0.00 : 373 }
>
> And I added many perf developers into Cc: because I want to listen to your
> opinions about this new feature, if you don't mind.
>
> If you give some feedback, I'd appreciate it! :)

Thanks Taeung,

I requested this feature some time ago and it's really cool to see someone
step up and implement it - much appreciated!

I just tested it out on my pet-example that leverages C++ instead of C:

~~~~~
#include <complex>
#include <cmath>
#include <random>
#include <iostream>

using namespace std;

int main()
{
uniform_real_distribution<double> uniform(-1E5, 1E5);
default_random_engine engine;
double s = 0;
for (int i = 0; i < 10000000; ++i) {
s += norm(complex<double>(uniform(engine), uniform(engine)));
}
cout << s << '\n';
return 0;
}
~~~~~

Compile it with:

g++ -O2 -g -std=c++11 test.cpp -o test

Then record it with perf:

perf record --call-graph dwarf ./test

Then analyse it with `perf report`. You'll see one entry for main with
something like:

+ 100.00% 39.69% cpp-inlining cpp-inlining [.] main

Select it and annotate it, then switch to your new source-only view:

main test.cpp
│ 30
│ 31 using namespace std;
│ 32
│ 33 int main()
│+ 34 {
│ 35 uniform_real_distribution<double> uniform(-1E5, 1E5);
│ 36 default_random_engine engine;
│+ 37 double s = 0;
│+ 38 for (int i = 0; i < 10000000; ++i) {
4.88 │+ 39 s += norm(complex<double>(uniform(engine),
uniform(engine)));
│ 40 }
│ 41 cout << s << '\n';
│ 42 return 0;
│+ 43 }

Note: the line numbers are off b/c my file contains a file-header on-top.
Ignore that.

Note2: There is no column header shown, so it's unclear what the first column
represents.

Note 3: report showed 39.69% self cost in main, 100.00% inclusive. annotate
shows 4.88... What is that?

What this shows, is that it's extremely important to visualize inclusive cost
_and_ self cost in this view. Additionally, we need to account for inlining.
Right now, we only see the self cost that is directly within main, I suspect.
For C++ this is usually very misleading, and basically makes the annotate view
completely useless for application-level profiling. If a second column would
be added with the inclusive cost with the ability to drill down, then I could
easily see myself using this view.

I would appreciate if you could take this into account.

Thanks a lot


--
Milian Wolff | [email protected] | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts


Attachments:
smime.p7s (3.74 kB)

2017-06-28 16:27:47

by Taeung Song

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/4] perf annotate: Add --source-only option and the new source code TUI view



On 06/28/2017 06:53 PM, Milian Wolff wrote:
> On Wednesday, June 28, 2017 5:18:08 AM CEST Taeung Song wrote:
>> Hi,
>>
>> The --source-only option and new source code TUI view can show
>> the result of performance analysis based on full source code per
>> symbol(function). (Namhyung Kim told me this idea and it was also requested
>> by others some time ago..)
>>
>> If someone wants to see the cause, he/she will need to dig into the asm.
>> But before that, looking at the source level can give a hint or clue
>> for the problem.
>>
>> For example, if target symbol is 'hex2u64' of util/util.c,
>> the output is like below.
>>
>> $ perf annotate --source-only --stdio -s hex2u64
>> Percent | Source code of util.c for cycles:ppp (42 samples)
>> -----------------------------------------------------------------
>> 0.00 : 354 * While we find nice hex chars, build a long_val.
>> 0.00 : 355 * Return number of chars processed.
>> 0.00 : 356 */
>> 0.00 : 357 int hex2u64(const char *ptr, u64 *long_val)
>> 2.38 : 358 {
>> 2.38 : 359 const char *p = ptr;
>> 0.00 : 360 *long_val = 0;
>> 0.00 : 361
>> 30.95 : 362 while (*p) {
>> 23.81 : 363 const int hex_val = hex(*p);
>> 0.00 : 364
>> 14.29 : 365 if (hex_val < 0)
>> 0.00 : 366 break;
>> 0.00 : 367
>> 26.19 : 368 *long_val = (*long_val << 4) | hex_val;
>> 0.00 : 369 p++;
>> 0.00 : 370 }
>> 0.00 : 371
>> 0.00 : 372 return p - ptr;
>> 0.00 : 373 }
>>
>> And I added many perf developers into Cc: because I want to listen to your
>> opinions about this new feature, if you don't mind.
>>
>> If you give some feedback, I'd appreciate it! :)
>
> Thanks Taeung,
>
> I requested this feature some time ago and it's really cool to see someone
> step up and implement it - much appreciated!

Thank you so much, Milian !! :)

>
> I just tested it out on my pet-example that leverages C++ instead of C:
>
> ~~~~~
> #include <complex>
> #include <cmath>
> #include <random>
> #include <iostream>
>
> using namespace std;
>
> int main()
> {
> uniform_real_distribution<double> uniform(-1E5, 1E5);
> default_random_engine engine;
> double s = 0;
> for (int i = 0; i < 10000000; ++i) {
> s += norm(complex<double>(uniform(engine), uniform(engine)));
> }
> cout << s << '\n';
> return 0;
> }
> ~~~~~
>
> Compile it with:
>
> g++ -O2 -g -std=c++11 test.cpp -o test
>
> Then record it with perf:
>
> perf record --call-graph dwarf ./test
>
> Then analyse it with `perf report`. You'll see one entry for main with
> something like:
>
> + 100.00% 39.69% cpp-inlining cpp-inlining [.] main
>
> Select it and annotate it, then switch to your new source-only view:
>
> main test.cpp
> │ 30
> │ 31 using namespace std;
> │ 32
> │ 33 int main()
> │+ 34 {
> │ 35 uniform_real_distribution<double> uniform(-1E5, 1E5);
> │ 36 default_random_engine engine;
> │+ 37 double s = 0;
> │+ 38 for (int i = 0; i < 10000000; ++i) {
> 4.88 │+ 39 s += norm(complex<double>(uniform(engine),
> uniform(engine)));
> │ 40 }
> │ 41 cout << s << '\n';
> │ 42 return 0;
> │+ 43 }
>
> Note: the line numbers are off b/c my file contains a file-header on-top.
> Ignore that.
>
> Note2: There is no column header shown, so it's unclear what the first column
> represents.
>
> Note 3: report showed 39.69% self cost in main, 100.00% inclusive. annotate
> shows 4.88... What is that?
>
> What this shows, is that it's extremely important to visualize inclusive cost
> _and_ self cost in this view. Additionally, we need to account for inlining.
> Right now, we only see the self cost that is directly within main, I suspect.
> For C++ this is usually very misleading, and basically makes the annotate view
> completely useless for application-level profiling. If a second column would
> be added with the inclusive cost with the ability to drill down, then I could
> easily see myself using this view.
>
> I would appreciate if you could take this into account.
>
> Thanks a lot
>
>

Sure, I got it.
I'll investigate this weird case and recheck this patchset based on your
comments,
and then I'll reply again. :)

Thanks,
Taeung

2017-06-28 16:33:13

by Milian Wolff

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/4] perf annotate: Add --source-only option and the new source code TUI view

On Wednesday, June 28, 2017 6:27:34 PM CEST Taeung Song wrote:
> On 06/28/2017 06:53 PM, Milian Wolff wrote:
> > On Wednesday, June 28, 2017 5:18:08 AM CEST Taeung Song wrote:
> >> Hi,
> >>
> >> The --source-only option and new source code TUI view can show
> >> the result of performance analysis based on full source code per
> >> symbol(function). (Namhyung Kim told me this idea and it was also
> >> requested
> >> by others some time ago..)
> >>
> >> If someone wants to see the cause, he/she will need to dig into the asm.
> >> But before that, looking at the source level can give a hint or clue
> >> for the problem.
> >>
> >> For example, if target symbol is 'hex2u64' of util/util.c,
> >> the output is like below.
> >>
> >> $ perf annotate --source-only --stdio -s hex2u64
> >>
> >> Percent | Source code of util.c for cycles:ppp (42 samples)
> >>
> >> -----------------------------------------------------------------
> >>
> >> 0.00 : 354 * While we find nice hex chars, build a long_val.
> >> 0.00 : 355 * Return number of chars processed.
> >> 0.00 : 356 */
> >> 0.00 : 357 int hex2u64(const char *ptr, u64 *long_val)
> >> 2.38 : 358 {
> >> 2.38 : 359 const char *p = ptr;
> >> 0.00 : 360 *long_val = 0;
> >> 0.00 : 361
> >>
> >> 30.95 : 362 while (*p) {
> >> 23.81 : 363 const int hex_val = hex(*p);
> >>
> >> 0.00 : 364
> >>
> >> 14.29 : 365 if (hex_val < 0)
> >>
> >> 0.00 : 366 break;
> >> 0.00 : 367
> >>
> >> 26.19 : 368 *long_val = (*long_val << 4) | hex_val;
> >>
> >> 0.00 : 369 p++;
> >> 0.00 : 370 }
> >> 0.00 : 371
> >> 0.00 : 372 return p - ptr;
> >> 0.00 : 373 }
> >>
> >> And I added many perf developers into Cc: because I want to listen to
> >> your
> >> opinions about this new feature, if you don't mind.
> >>
> >> If you give some feedback, I'd appreciate it! :)
> >
> > Thanks Taeung,
> >
> > I requested this feature some time ago and it's really cool to see someone
> > step up and implement it - much appreciated!
>
> Thank you so much, Milian !! :)
>
> > I just tested it out on my pet-example that leverages C++ instead of C:
> >
> > ~~~~~
> > #include <complex>
> > #include <cmath>
> > #include <random>
> > #include <iostream>
> >
> > using namespace std;
> >
> > int main()
> > {
> >
> > uniform_real_distribution<double> uniform(-1E5, 1E5);
> > default_random_engine engine;
> > double s = 0;
> > for (int i = 0; i < 10000000; ++i) {
> >
> > s += norm(complex<double>(uniform(engine), uniform(engine)));
> >
> > }
> > cout << s << '\n';
> > return 0;
> >
> > }
> > ~~~~~
> >
> > Compile it with:
> >
> > g++ -O2 -g -std=c++11 test.cpp -o test
> >
> > Then record it with perf:
> >
> > perf record --call-graph dwarf ./test
> >
> > Then analyse it with `perf report`. You'll see one entry for main with
> > something like:
> >
> > + 100.00% 39.69% cpp-inlining cpp-inlining [.] main
> >
> > Select it and annotate it, then switch to your new source-only view:
> >
> > main test.cpp
> >
> > │ 30
> > │ 31 using namespace std;
> > │ 32
> > │ 33 int main()
> > │+ 34 {
> > │ 35 uniform_real_distribution<double> uniform(-1E5, 1E5);
> > │ 36 default_random_engine engine;
> > │+ 37 double s = 0;
> > │+ 38 for (int i = 0; i < 10000000; ++i) {
> >
> > 4.88 │+ 39 s += norm(complex<double>(uniform(engine),
> >
> > uniform(engine)));
> >
> > │ 40 }
> > │ 41 cout << s << '\n';
> > │ 42 return 0;
> > │+ 43 }
> >
> > Note: the line numbers are off b/c my file contains a file-header on-top.
> > Ignore that.
> >
> > Note2: There is no column header shown, so it's unclear what the first
> > column represents.
> >
> > Note 3: report showed 39.69% self cost in main, 100.00% inclusive.
> > annotate
> > shows 4.88... What is that?
> >
> > What this shows, is that it's extremely important to visualize inclusive
> > cost _and_ self cost in this view. Additionally, we need to account for
> > inlining. Right now, we only see the self cost that is directly within
> > main, I suspect. For C++ this is usually very misleading, and basically
> > makes the annotate view completely useless for application-level
> > profiling. If a second column would be added with the inclusive cost with
> > the ability to drill down, then I could easily see myself using this
> > view.
> >
> > I would appreciate if you could take this into account.
> >
> > Thanks a lot
>
> Sure, I got it.
> I'll investigate this weird case and recheck this patchset based on your
> comments,
> and then I'll reply again. :)

Cool, I'm happy to test this. Note though that this is not really a "weird
case" for a C++ developer. It's rather the norm of what we have to deal
with...

Cheers

--
Milian Wolff | [email protected] | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts


Attachments:
smime.p7s (3.74 kB)

2017-06-29 07:11:15

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/4] perf annotate: Add --source-only option and the new source code TUI view

Hello,

On Wed, Jun 28, 2017 at 11:53:22AM +0200, Milian Wolff wrote:
> On Wednesday, June 28, 2017 5:18:08 AM CEST Taeung Song wrote:
> > Hi,
> >
> > The --source-only option and new source code TUI view can show
> > the result of performance analysis based on full source code per
> > symbol(function). (Namhyung Kim told me this idea and it was also requested
> > by others some time ago..)
> >
> > If someone wants to see the cause, he/she will need to dig into the asm.
> > But before that, looking at the source level can give a hint or clue
> > for the problem.
> >
> > For example, if target symbol is 'hex2u64' of util/util.c,
> > the output is like below.
> >
> > $ perf annotate --source-only --stdio -s hex2u64
> > Percent | Source code of util.c for cycles:ppp (42 samples)
> > -----------------------------------------------------------------
> > 0.00 : 354 * While we find nice hex chars, build a long_val.
> > 0.00 : 355 * Return number of chars processed.
> > 0.00 : 356 */
> > 0.00 : 357 int hex2u64(const char *ptr, u64 *long_val)
> > 2.38 : 358 {
> > 2.38 : 359 const char *p = ptr;
> > 0.00 : 360 *long_val = 0;
> > 0.00 : 361
> > 30.95 : 362 while (*p) {
> > 23.81 : 363 const int hex_val = hex(*p);
> > 0.00 : 364
> > 14.29 : 365 if (hex_val < 0)
> > 0.00 : 366 break;
> > 0.00 : 367
> > 26.19 : 368 *long_val = (*long_val << 4) | hex_val;
> > 0.00 : 369 p++;
> > 0.00 : 370 }
> > 0.00 : 371
> > 0.00 : 372 return p - ptr;
> > 0.00 : 373 }
> >
> > And I added many perf developers into Cc: because I want to listen to your
> > opinions about this new feature, if you don't mind.
> >
> > If you give some feedback, I'd appreciate it! :)
>
> Thanks Taeung,
>
> I requested this feature some time ago and it's really cool to see someone
> step up and implement it - much appreciated!
>
> I just tested it out on my pet-example that leverages C++ instead of C:
>
> ~~~~~
> #include <complex>
> #include <cmath>
> #include <random>
> #include <iostream>
>
> using namespace std;
>
> int main()
> {
> uniform_real_distribution<double> uniform(-1E5, 1E5);
> default_random_engine engine;
> double s = 0;
> for (int i = 0; i < 10000000; ++i) {
> s += norm(complex<double>(uniform(engine), uniform(engine)));
> }
> cout << s << '\n';
> return 0;
> }
> ~~~~~
>
> Compile it with:
>
> g++ -O2 -g -std=c++11 test.cpp -o test
>
> Then record it with perf:
>
> perf record --call-graph dwarf ./test
>
> Then analyse it with `perf report`. You'll see one entry for main with
> something like:
>
> + 100.00% 39.69% cpp-inlining cpp-inlining [.] main
>
> Select it and annotate it, then switch to your new source-only view:
>
> main test.cpp
> │ 30 > │ 31 using namespace std; > │ 32 > │ 33 int main() > │+ 34 { > │ 35 uniform_real_distribution<double> uniform(-1E5, 1E5); > │ 36 default_random_engine engine; > │+ 37 double s = 0; > │+ 38 for (int i = 0; i < 10000000; ++i) { > 4.88 │+ 39 s += norm(complex<double>(uniform(engine), uniform(engine))); > │ 40 } > │ 41 cout << s << '\n'; > │ 42 return 0; > │+ 43 }
>
> Note: the line numbers are off b/c my file contains a file-header on-top.
> Ignore that.
>
> Note2: There is no column header shown, so it's unclear what the first column
> represents.
>
> Note 3: report showed 39.69% self cost in main, 100.00% inclusive. annotate
> shows 4.88... What is that?
>
> What this shows, is that it's extremely important to visualize inclusive cost
> _and_ self cost in this view. Additionally, we need to account for inlining.
> Right now, we only see the self cost that is directly within main, I suspect.

Currently perf annotate doesn't use the sample period, it uses sample
count instead and print the percentage within the function. So it's a
different number to the perf report. I think we need to fix this
first.

Thanks,
Namhyung


> For C++ this is usually very misleading, and basically makes the annotate view
> completely useless for application-level profiling. If a second column would
> be added with the inclusive cost with the ability to drill down, then I could
> easily see myself using this view.
>
> I would appreciate if you could take this into account.
>
> Thanks a lot
>
>
> --
> Milian Wolff | [email protected] | Senior Software Engineer
> KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
> Tel: +49-30-521325470
> KDAB - The Qt Experts


2017-06-30 07:14:54

by Taeung Song

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/4] perf annotate: Add --source-only option and the new source code TUI view

Hi,

On 06/29/2017 04:11 PM, Namhyung Kim wrote:
> Hello,
>
> On Wed, Jun 28, 2017 at 11:53:22AM +0200, Milian Wolff wrote:
>> On Wednesday, June 28, 2017 5:18:08 AM CEST Taeung Song wrote:
>>> Hi,
>>>
>>> The --source-only option and new source code TUI view can show
>>> the result of performance analysis based on full source code per
>>> symbol(function). (Namhyung Kim told me this idea and it was also requested
>>> by others some time ago..)
>>>
>>> If someone wants to see the cause, he/she will need to dig into the asm.
>>> But before that, looking at the source level can give a hint or clue
>>> for the problem.
>>>
>>> For example, if target symbol is 'hex2u64' of util/util.c,
>>> the output is like below.
>>>
>>> $ perf annotate --source-only --stdio -s hex2u64
>>> Percent | Source code of util.c for cycles:ppp (42 samples)
>>> -----------------------------------------------------------------
>>> 0.00 : 354 * While we find nice hex chars, build a long_val.
>>> 0.00 : 355 * Return number of chars processed.
>>> 0.00 : 356 */
>>> 0.00 : 357 int hex2u64(const char *ptr, u64 *long_val)
>>> 2.38 : 358 {
>>> 2.38 : 359 const char *p = ptr;
>>> 0.00 : 360 *long_val = 0;
>>> 0.00 : 361
>>> 30.95 : 362 while (*p) {
>>> 23.81 : 363 const int hex_val = hex(*p);
>>> 0.00 : 364
>>> 14.29 : 365 if (hex_val < 0)
>>> 0.00 : 366 break;
>>> 0.00 : 367
>>> 26.19 : 368 *long_val = (*long_val << 4) | hex_val;
>>> 0.00 : 369 p++;
>>> 0.00 : 370 }
>>> 0.00 : 371
>>> 0.00 : 372 return p - ptr;
>>> 0.00 : 373 }
>>>
>>> And I added many perf developers into Cc: because I want to listen to your
>>> opinions about this new feature, if you don't mind.
>>>
>>> If you give some feedback, I'd appreciate it! :)
>>
>> Thanks Taeung,
>>
>> I requested this feature some time ago and it's really cool to see someone
>> step up and implement it - much appreciated!
>>
>> I just tested it out on my pet-example that leverages C++ instead of C:
>>
>> ~~~~~
>> #include <complex>
>> #include <cmath>
>> #include <random>
>> #include <iostream>
>>
>> using namespace std;
>>
>> int main()
>> {
>> uniform_real_distribution<double> uniform(-1E5, 1E5);
>> default_random_engine engine;
>> double s = 0;
>> for (int i = 0; i < 10000000; ++i) {
>> s += norm(complex<double>(uniform(engine), uniform(engine)));
>> }
>> cout << s << '\n';
>> return 0;
>> }
>> ~~~~~
>>
>> Compile it with:
>>
>> g++ -O2 -g -std=c++11 test.cpp -o test
>>
>> Then record it with perf:
>>
>> perf record --call-graph dwarf ./test
>>
>> Then analyse it with `perf report`. You'll see one entry for main with
>> something like:
>>
>> + 100.00% 39.69% cpp-inlining cpp-inlining [.] main
>>
>> Select it and annotate it, then switch to your new source-only view:
>>
>> main test.cpp
>> │ 30 > │ 31 using namespace std; > │ 32 > │ 33 int main() > │+ 34 { > │ 35 uniform_real_distribution<double> uniform(-1E5, 1E5); > │ 36 default_random_engine engine; > │+ 37 double s = 0; > │+ 38 for (int i = 0; i < 10000000; ++i) { > 4.88 │+ 39 s += norm(complex<double>(uniform(engine), uniform(engine)));
> │ 40 } > │ 41 cout << s << '\n'; > │ 42 return 0; > │+ 43 }
>>
>> Note: the line numbers are off b/c my file contains a file-header on-top.
>> Ignore that.
>>
>> Note2: There is no column header shown, so it's unclear what the first column
>> represents.
>>
>> Note 3: report showed 39.69% self cost in main, 100.00% inclusive. annotate
>> shows 4.88... What is that?
>>
>> What this shows, is that it's extremely important to visualize inclusive cost
>> _and_ self cost in this view. Additionally, we need to account for inlining.
>> Right now, we only see the self cost that is directly within main, I suspect.
>
> Currently perf annotate doesn't use the sample period, it uses sample
> count instead and print the percentage within the function. So it's a
> different number to the perf report. I think we need to fix this
> first.
>
> Thanks,
> Namhyung
>

I understood. Hum.. so we need to replace the menu and column about
total period to things about sample count like below ?

"t Toggle total period view"

-> "t Toggle showing the number of samples for"

(I'm not sure what a short key(e.g. 't') is proper..)

Or modifying the code related to the number of samples,
show actual total period on perf-annotate ?

What do you think about this change ?

Thanks,
Taeung

>
>> For C++ this is usually very misleading, and basically makes the annotate view
>> completely useless for application-level profiling. If a second column would
>> be added with the inclusive cost with the ability to drill down, then I could
>> easily see myself using this view.
>>
>> I would appreciate if you could take this into account.
>>
>> Thanks a lot
>>
>>
>> --
>> Milian Wolff | [email protected] | Senior Software Engineer
>> KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
>> Tel: +49-30-521325470
>> KDAB - The Qt Experts
>
>

2017-06-30 16:22:00

by Taeung Song

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/4] perf annotate: Add --source-only option and the new source code TUI view

I'm late..

On 06/28/2017 06:53 PM, Milian Wolff wrote:
> On Wednesday, June 28, 2017 5:18:08 AM CEST Taeung Song wrote:
>> Hi,
>>
>> The --source-only option and new source code TUI view can show
>> the result of performance analysis based on full source code per
>> symbol(function). (Namhyung Kim told me this idea and it was also requested
>> by others some time ago..)
>>
>> If someone wants to see the cause, he/she will need to dig into the asm.
>> But before that, looking at the source level can give a hint or clue
>> for the problem.
>>
>> For example, if target symbol is 'hex2u64' of util/util.c,
>> the output is like below.
>>
>> $ perf annotate --source-only --stdio -s hex2u64
>> Percent | Source code of util.c for cycles:ppp (42 samples)
>> -----------------------------------------------------------------
>> 0.00 : 354 * While we find nice hex chars, build a long_val.
>> 0.00 : 355 * Return number of chars processed.
>> 0.00 : 356 */
>> 0.00 : 357 int hex2u64(const char *ptr, u64 *long_val)
>> 2.38 : 358 {
>> 2.38 : 359 const char *p = ptr;
>> 0.00 : 360 *long_val = 0;
>> 0.00 : 361
>> 30.95 : 362 while (*p) {
>> 23.81 : 363 const int hex_val = hex(*p);
>> 0.00 : 364
>> 14.29 : 365 if (hex_val < 0)
>> 0.00 : 366 break;
>> 0.00 : 367
>> 26.19 : 368 *long_val = (*long_val << 4) | hex_val;
>> 0.00 : 369 p++;
>> 0.00 : 370 }
>> 0.00 : 371
>> 0.00 : 372 return p - ptr;
>> 0.00 : 373 }
>>
>> And I added many perf developers into Cc: because I want to listen to your
>> opinions about this new feature, if you don't mind.
>>
>> If you give some feedback, I'd appreciate it! :)
>
> Thanks Taeung,
>
> I requested this feature some time ago and it's really cool to see someone
> step up and implement it - much appreciated!
>
> I just tested it out on my pet-example that leverages C++ instead of C:
>
> ~~~~~
> #include <complex>
> #include <cmath>
> #include <random>
> #include <iostream>
>
> using namespace std;
>
> int main()
> {
> uniform_real_distribution<double> uniform(-1E5, 1E5);
> default_random_engine engine;
> double s = 0;
> for (int i = 0; i < 10000000; ++i) {
> s += norm(complex<double>(uniform(engine), uniform(engine)));
> }
> cout << s << '\n';
> return 0;
> }
> ~~~~~
>
> Compile it with:
>
> g++ -O2 -g -std=c++11 test.cpp -o test
>
> Then record it with perf:
>
> perf record --call-graph dwarf ./test
>
> Then analyse it with `perf report`. You'll see one entry for main with
> something like:
>
> + 100.00% 39.69% cpp-inlining cpp-inlining [.] main
>
> Select it and annotate it, then switch to your new source-only view:
>
> main test.cpp
> │ 30
> │ 31 using namespace std;
> │ 32
> │ 33 int main()
> │+ 34 {
> │ 35 uniform_real_distribution<double> uniform(-1E5, 1E5);
> │ 36 default_random_engine engine;
> │+ 37 double s = 0;
> │+ 38 for (int i = 0; i < 10000000; ++i) {
> 4.88 │+ 39 s += norm(complex<double>(uniform(engine),
> uniform(engine)));
> │ 40 }
> │ 41 cout << s << '\n';
> │ 42 return 0;
> │+ 43 }
>
> Note: the line numbers are off b/c my file contains a file-header on-top.
> Ignore that.
>
> Note2: There is no column header shown, so it's unclear what the first column
> represents.

Okey, I'll add the first column.

>
> Note 3: report showed 39.69% self cost in main, 100.00% inclusive. annotate
> shows 4.88... What is that?

My case is different from your result..but I'll keep digging things
related to the inclusive and self cost to understand the above case,
remaking v2 patchset.

>
> What this shows, is that it's extremely important to visualize inclusive cost
> _and_ self cost in this view. Additionally, we need to account for inlining.
> Right now, we only see the self cost that is directly within main, I suspect.

I handled only one source code file on this patchset,
so I also think it seems to be needed to check other source code file to
handle inlining..

> For C++ this is usually very misleading, and basically makes the annotate view
> completely useless for application-level profiling. If a second column would
> be added with the inclusive cost with the ability to drill down, then I could
> easily see myself using this view.

Okey, will try to make the feature for 'drill down' after this basic
patchset.


BTW, I'll resend v2 patchset later

- considering several source files when getting srcline info
- considering inline function
- investigating the inclusive and self cost

Thanks,
Taeung

>
> I would appreciate if you could take this into account.
>
> Thanks a lot
>
>