On Thu, 2012-09-13 at 16:19 +0900, Namhyung Kim wrote:
> When --cumulate option is given, it'll be shown like this:
>
> $ perf report --cumulate
> (...)
> + 93.63% abc libc-2.15.so [.] __libc_start_main
> + 93.35% abc abc [.] main
> + 93.35% abc abc [.] c
> + 93.35% abc abc [.] b
> + 93.35% abc abc [.] a
> + 5.17% abc ld-2.15.so [.] _dl_map_object
> + 5.17% abc ld-2.15.so [.] _dl_map_object_from_fd
> + 1.13% abc ld-2.15.so [.] _dl_start_user
> + 1.13% abc ld-2.15.so [.] _dl_start
> + 0.29% abc perf [.] main
> + 0.29% abc perf [.] run_builtin
> + 0.29% abc perf [.] cmd_record
> + 0.29% abc libpthread-2.15.so [.] __libc_close
> + 0.07% abc ld-2.15.so [.] _start
> + 0.07% abc [kernel.kallsyms] [k] page_fault
>
> (This output came from TUI since stdio bothered by callchains)
Right, so I tried this and I would expect the callchains to be inverted
too, so that when I expand say 'c' I would see that 'c' calls 'b' for
100% which calls 'a' for 100%.
Instead I get the regular callchains, expanding 'c' gives me main calls
it for 100%.
Adding -G (invert callchains) doesn't make it better, in that case, when
I expand 'c' we start at '__libc_start_main' instead of 'c'.
Is there anything I'm missing?
On 10/29/12 12:08 PM, Peter Zijlstra wrote:
> Right, so I tried this and I would expect the callchains to be inverted
> too, so that when I expand say 'c' I would see that 'c' calls 'b' for
> 100% which calls 'a' for 100%.
>
> Instead I get the regular callchains, expanding 'c' gives me main calls
> it for 100%.
>
> Adding -G (invert callchains) doesn't make it better, in that case, when
> I expand 'c' we start at '__libc_start_main' instead of 'c'.
>
> Is there anything I'm missing?
>
Sounds like a reasonable expectation.
I tested mainly:
perf report --cumulate -g graph,100,callee
to find the functions with a large amount of CPU time underneath. Then
examined the callgraph without --cumulate. But yeah - it'd be nice to be
able to do both in a single invocation.
Also, when callgraphs are displayed, the percentages are off (> 100%).
Namhyung probably needs to use he->stat_acc->period in a few places as
the denominator instead of he->period.
-Arun
Hi Arun and Peter,
On Mon, 29 Oct 2012 14:36:01 -0700, Arun Sharma wrote:
> On 10/29/12 12:08 PM, Peter Zijlstra wrote:
>
>> Right, so I tried this and I would expect the callchains to be inverted
>> too, so that when I expand say 'c' I would see that 'c' calls 'b' for
>> 100% which calls 'a' for 100%.
>>
>> Instead I get the regular callchains, expanding 'c' gives me main calls
>> it for 100%.
>>
>> Adding -G (invert callchains) doesn't make it better, in that case, when
>> I expand 'c' we start at '__libc_start_main' instead of 'c'.
>>
>> Is there anything I'm missing?
>>
>
> Sounds like a reasonable expectation.
>
> I tested mainly:
>
> perf report --cumulate -g graph,100,callee
>
> to find the functions with a large amount of CPU time underneath. Then
> examined the callgraph without --cumulate. But yeah - it'd be nice to
> be able to do both in a single invocation.
Yes, the callchain part needs to be improved. Peter's idea indeed looks
good to me too.
But before doing that, I'd like to get an agreement on how to
design/implement this feature.
Sorry to Frederic (and Stephane), I'm bothering you multiple times with
this but I didn't get what you want exactly. IIUC you don't want to
have --cumulate option but to share branch sampling code to implement
it, right?
But the branch sampling output looks not fit to --cumulate usage IMHO.
Could you give me an advice?
>
> Also, when callgraphs are displayed, the percentages are off (>
> 100%). Namhyung probably needs to use he->stat_acc->period in a few
> places as the denominator instead of he->period.
I will look into it later.
Thanks,
Namhyung
On Tue, 2012-10-30 at 15:59 +0900, Namhyung Kim wrote:
> Yes, the callchain part needs to be improved. Peter's idea indeed looks
> good to me too.
FWIW, I think this is exactly what sysprof does, except that tool isn't
usable for other reasons.. You might want to look at it though.
* Peter Zijlstra <[email protected]> wrote:
> On Tue, 2012-10-30 at 15:59 +0900, Namhyung Kim wrote:
> > Yes, the callchain part needs to be improved. Peter's idea
> > indeed looks good to me too.
>
> FWIW, I think this is exactly what sysprof does, except that
> tool isn't usable for other reasons.. You might want to look
> at it though.
I always found the fundamental sysprof system-wide call graph
profiling output/view superior - and so do many Xorg developers
who are using SysProf that I talked to - so I'd strongly
encourage to use that ordering and grouping for the default perf
call-graph profiling output/view.
Thanks,
Ingo
On Tue, 30 Oct 2012 10:01:10 +0100, Ingo Molnar wrote:
> * Peter Zijlstra <[email protected]> wrote:
>
>> On Tue, 2012-10-30 at 15:59 +0900, Namhyung Kim wrote:
>
>> > Yes, the callchain part needs to be improved. Peter's idea
>> > indeed looks good to me too.
>>
>> FWIW, I think this is exactly what sysprof does, except that
>> tool isn't usable for other reasons.. You might want to look
>> at it though.
>
> I always found the fundamental sysprof system-wide call graph
> profiling output/view superior - and so do many Xorg developers
> who are using SysProf that I talked to - so I'd strongly
> encourage to use that ordering and grouping for the default perf
> call-graph profiling output/view.
Okay, I'll look at the sysprof.
Anyway, do you have any other comments for the general --cumulate
approach in this series (esp. with --branch-stack)?
Thanks,
Namhyung