2013-10-25 15:33:47

by Rodrigo Campos

[permalink] [raw]
Subject: State of "perf: Add a new sort order: SORT_INCLUSIVE"

Hi Namhyung,

Frederic Weisbecker and Arnaldo Carvalho de Melo told me on IRC that you were
working to forward-port a patch that adds a new sort order to perf report,
SORT_INCLUSIVE.

That will be useful for me and I was wondering if you are still working on that
or if there is a newer version than v6:

http://thread.gmane.org/gmane.linux.kernel.perf.user/882


Maybe you have a newer version of it not sent to the list ? Or do you plan to
work on it anytime soon ?

If there is any branch to test, please let me know :)




Thanks a lot,
Rodrigo


2013-10-28 05:09:53

by Namhyung Kim

[permalink] [raw]
Subject: Re: State of "perf: Add a new sort order: SORT_INCLUSIVE"

Hi Rodrigo,

On Fri, 25 Oct 2013 16:07:21 +0100, Rodrigo Campos wrote:
> Hi Namhyung,
>
> Frederic Weisbecker and Arnaldo Carvalho de Melo told me on IRC that you were
> working to forward-port a patch that adds a new sort order to perf report,
> SORT_INCLUSIVE.
>
> That will be useful for me and I was wondering if you are still working on that
> or if there is a newer version than v6:
>
> http://thread.gmane.org/gmane.linux.kernel.perf.user/882
>
>
> Maybe you have a newer version of it not sent to the list ? Or do you plan to
> work on it anytime soon ?
>
> If there is any branch to test, please let me know :)

Yes I have a patch series for that. But it's not a new sort order but a
new command line option --culumate. I don't think it should be a new
sort order since it affects only how it counts period value on each
sample not how samples are sorted/grouped.

It's about a year since I sent this series to list. I'll work on it and
send it to the list soon. But before that I have to re-read what's the
Frederic's concern - IIRC it's about consolidating code in perf report
that does similar things on branch stack.

Frederic, can you remember what was the problem?

Anyway, You can find the series and discussion on the link below:

https://lkml.org/lkml/2012/9/13/81

Thanks,
Namhyung

2013-10-28 08:43:02

by Rodrigo Campos

[permalink] [raw]
Subject: Re: State of "perf: Add a new sort order: SORT_INCLUSIVE"

On Mon, Oct 28, 2013 at 02:09:49PM +0900, Namhyung Kim wrote:
> Hi Rodrigo,
>
> On Fri, 25 Oct 2013 16:07:21 +0100, Rodrigo Campos wrote:
> >
> > That will be useful for me and I was wondering if you are still working on that
> > or if there is a newer version than v6:
> >
> > http://thread.gmane.org/gmane.linux.kernel.perf.user/882
> >
> >
> > Maybe you have a newer version of it not sent to the list ? Or do you plan to
> > work on it anytime soon ?
> >
> > If there is any branch to test, please let me know :)
>
> Yes I have a patch series for that. But it's not a new sort order but a
> new command line option --culumate. I don't think it should be a new
> sort order since it affects only how it counts period value on each
> sample not how samples are sorted/grouped.
>
> It's about a year since I sent this series to list. I'll work on it and
> send it to the list soon. But before that I have to re-read what's the

Great, thanks a lot!

> Frederic's concern - IIRC it's about consolidating code in perf report
> that does similar things on branch stack.
>
> Frederic, can you remember what was the problem?
>
> Anyway, You can find the series and discussion on the link below:
>
> https://lkml.org/lkml/2012/9/13/81

I've read the cover letter for that series and probably because I don't know
about perf internals I have a question: How will "--culumate" interact with
"--sort=dso" for example ?

I mean, is it possible for that to show more than 100% ? (if you add all the
93.35% in your example in the cover letter, or something similar). Or
"--culumate --sort=dso" will just group together all entries that have a dso in
the call chain ?

Sorry if the question was too obvious if I knew about perf internals :S




Thanks a lot,
Rodrigo

2013-10-28 08:49:40

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: State of "perf: Add a new sort order: SORT_INCLUSIVE"

2013/10/28 Namhyung Kim <[email protected]>:
> Hi Rodrigo,
>
> On Fri, 25 Oct 2013 16:07:21 +0100, Rodrigo Campos wrote:
>> Hi Namhyung,
>>
>> Frederic Weisbecker and Arnaldo Carvalho de Melo told me on IRC that you were
>> working to forward-port a patch that adds a new sort order to perf report,
>> SORT_INCLUSIVE.
>>
>> That will be useful for me and I was wondering if you are still working on that
>> or if there is a newer version than v6:
>>
>> http://thread.gmane.org/gmane.linux.kernel.perf.user/882
>>
>>
>> Maybe you have a newer version of it not sent to the list ? Or do you plan to
>> work on it anytime soon ?
>>
>> If there is any branch to test, please let me know :)
>
> Yes I have a patch series for that. But it's not a new sort order but a
> new command line option --culumate. I don't think it should be a new
> sort order since it affects only how it counts period value on each
> sample not how samples are sorted/grouped.
>
> It's about a year since I sent this series to list. I'll work on it and
> send it to the list soon.

Thanks a lot!

> But before that I have to re-read what's the
> Frederic's concern - IIRC it's about consolidating code in perf report
> that does similar things on branch stack.
>
> Frederic, can you remember what was the problem?
>
> Anyway, You can find the series and discussion on the link below:
>
> https://lkml.org/lkml/2012/9/13/81

Right so my concern was that it should be an extension of "perf report
- b" rather than an ad-hoc. Cumulative callchains is basically what
perf report -b does but with callchains. And since callchains are
branches, this should be a perfect fit.

Also I don't remember if perf report -b adds branches to the hist with
a period of 1 of with the period of the referring event. I remember
there was an issue a long time ago there. May be it was fixed. Anyway
IMO the period of 1 doesn't make sense, branches should be added with
the period of the event.

Thanks.

2013-10-28 09:09:33

by Namhyung Kim

[permalink] [raw]
Subject: Re: State of "perf: Add a new sort order: SORT_INCLUSIVE"

On Mon, 28 Oct 2013 08:42:44 +0000, Rodrigo Campos wrote:
> On Mon, Oct 28, 2013 at 02:09:49PM +0900, Namhyung Kim wrote:
>> Anyway, You can find the series and discussion on the link below:
>>
>> https://lkml.org/lkml/2012/9/13/81
>
> I've read the cover letter for that series and probably because I don't know
> about perf internals I have a question: How will "--culumate" interact with
> "--sort=dso" for example ?
>
> I mean, is it possible for that to show more than 100% ? (if you add all the
> 93.35% in your example in the cover letter, or something similar). Or
> "--culumate --sort=dso" will just group together all entries that have a dso in
> the call chain ?

Hmm.. I think --cumulate option is only meaningful when sort order
includes symbol. Maybe I can add support for --sort=dso case as well
but not sure it's worth. Do you think it's really needed?

>
> Sorry if the question was too obvious if I knew about perf internals :S

No need to sorry about it. :)

Thanks,
Namhyung

2013-10-28 09:30:13

by Rodrigo Campos

[permalink] [raw]
Subject: Re: State of "perf: Add a new sort order: SORT_INCLUSIVE"

On Mon, Oct 28, 2013 at 06:09:30PM +0900, Namhyung Kim wrote:
> On Mon, 28 Oct 2013 08:42:44 +0000, Rodrigo Campos wrote:
> > On Mon, Oct 28, 2013 at 02:09:49PM +0900, Namhyung Kim wrote:
> >> Anyway, You can find the series and discussion on the link below:
> >>
> >> https://lkml.org/lkml/2012/9/13/81
> >
> > I've read the cover letter for that series and probably because I don't know
> > about perf internals I have a question: How will "--culumate" interact with
> > "--sort=dso" for example ?
> >
> > I mean, is it possible for that to show more than 100% ? (if you add all the
> > 93.35% in your example in the cover letter, or something similar). Or
> > "--culumate --sort=dso" will just group together all entries that have a dso in
> > the call chain ?
>
> Hmm.. I think --cumulate option is only meaningful when sort order
> includes symbol. Maybe I can add support for --sort=dso case as well
> but not sure it's worth. Do you think it's really needed?

I don't know if it is *needed*, but that was what I need :)

I mean, what I was looking for is a way to know the percentage spent when some
symbol in the call chain belongs to a DSO. Like group all samples that have a
symbol of a DSO in the callchain, and know the percentage for that group.
Something like the "inclusive" time spent inside a library.


> > Sorry if the question was too obvious if I knew about perf internals :S
>
> No need to sorry about it. :)

Thanks =)




Thanks again,
Rodrigo

2013-10-28 17:56:17

by Arun Sharma

[permalink] [raw]
Subject: Re: State of "perf: Add a new sort order: SORT_INCLUSIVE"

On 10/28/13 2:29 AM, Rodrigo Campos wrote:
> On Mon, Oct 28, 2013 at 06:09:30PM +0900, Namhyung Kim wrote:
>> On Mon, 28 Oct 2013 08:42:44 +0000, Rodrigo Campos wrote:
>>> On Mon, Oct 28, 2013 at 02:09:49PM +0900, Namhyung Kim wrote:
>>>> Anyway, You can find the series and discussion on the link below:
>>>>
>>>> https://lkml.org/lkml/2012/9/13/81
>>>
>>> I've read the cover letter for that series and probably because I don't know
>>> about perf internals I have a question: How will "--culumate" interact with
>>> "--sort=dso" for example ?
>>>
>>> I mean, is it possible for that to show more than 100% ? (if you add all the
>>> 93.35% in your example in the cover letter, or something similar). Or
>>> "--culumate --sort=dso" will just group together all entries that have a dso in
>>> the call chain ?
>>
>> Hmm.. I think --cumulate option is only meaningful when sort order
>> includes symbol. Maybe I can add support for --sort=dso case as well
>> but not sure it's worth. Do you think it's really needed?
>
> I don't know if it is *needed*, but that was what I need :)

I suspect that users will find creative ways of using these options to
solve real world problems and we shouldn't restrict usage any more than
we need to to protect against obvious bugs/crashes.

Also, what's the reasoning for --cumulate not being an option under perf
record -g ..,<order>?

In order to integrate perf record -b and --cumulate, we'll have to sort
out the underlying infrastructure for processing callgraphs and branch
stacks. I think the main roadblock here is that one is statistical and
on many CPUs incomplete (only top N branches are reported).

Given that there are clear use cases in production involving complex
callgraphs, I'm for getting this support in first and then reconciling
the differences with perf record -b later.

-Arun

2013-10-29 03:11:55

by Namhyung Kim

[permalink] [raw]
Subject: Re: State of "perf: Add a new sort order: SORT_INCLUSIVE"

Hi Arun,

On Mon, 28 Oct 2013 09:43:21 -0700, Arun Sharma wrote:
> On 10/28/13 2:29 AM, Rodrigo Campos wrote:
>> On Mon, Oct 28, 2013 at 06:09:30PM +0900, Namhyung Kim wrote:
>>> On Mon, 28 Oct 2013 08:42:44 +0000, Rodrigo Campos wrote:
>>>> On Mon, Oct 28, 2013 at 02:09:49PM +0900, Namhyung Kim wrote:
>>>>> Anyway, You can find the series and discussion on the link below:
>>>>>
>>>>> https://lkml.org/lkml/2012/9/13/81
>>>>
>>>> I've read the cover letter for that series and probably because I don't know
>>>> about perf internals I have a question: How will "--culumate" interact with
>>>> "--sort=dso" for example ?
>>>>
>>>> I mean, is it possible for that to show more than 100% ? (if you add all the
>>>> 93.35% in your example in the cover letter, or something similar). Or
>>>> "--culumate --sort=dso" will just group together all entries that have a dso in
>>>> the call chain ?
>>>
>>> Hmm.. I think --cumulate option is only meaningful when sort order
>>> includes symbol. Maybe I can add support for --sort=dso case as well
>>> but not sure it's worth. Do you think it's really needed?
>>
>> I don't know if it is *needed*, but that was what I need :)
>
> I suspect that users will find creative ways of using these options to
> solve real world problems and we shouldn't restrict usage any more
> than we need to to protect against obvious bugs/crashes.
>
> Also, what's the reasoning for --cumulate not being an option under
> perf record -g ..,<order>?

Sorry, I cannot understand you. The 'perf record' just saves sample
data (and callchains) from the ring-buffer. All the processing happens
in 'perf report'. I can't see what you expect from the 'perf record
--cumulate'. Am I missing something?

>
> In order to integrate perf record -b and --cumulate, we'll have to
> sort out the underlying infrastructure for processing callgraphs and
> branch stacks. I think the main roadblock here is that one is
> statistical and on many CPUs incomplete (only top N branches are
> reported).
>
> Given that there are clear use cases in production involving complex
> callgraphs, I'm for getting this support in first and then reconciling
> the differences with perf record -b later.

I think what Frederic said is that the code de-duplication of 'perf
report' side. The branch stack and --cumulate are different - branch
stack concentrates on the branch itself but --cumulate uses callchains
to find parents and give some credit to them as side information.

Thanks,
Namhyung

2013-10-29 04:33:25

by Arun Sharma

[permalink] [raw]
Subject: Re: State of "perf: Add a new sort order: SORT_INCLUSIVE"

On 10/28/13 8:11 PM, Namhyung Kim wrote:

Hey Namhyung:

>>
>> Also, what's the reasoning for --cumulate not being an option under
>> perf record -g ..,<order>?
>
> Sorry, I cannot understand you. The 'perf record' just saves sample
> data (and callchains) from the ring-buffer. All the processing happens
> in 'perf report'. I can't see what you expect from the 'perf record
> --cumulate'. Am I missing something?

Yes - I meant to say perf report -g :)

> -g [type,min[,limit],order]

Specifically, along with callee, caller, we could have a third option.
Or we could have a new type (graph, fractal, cumulative).

>> Given that there are clear use cases in production involving complex
>> callgraphs, I'm for getting this support in first and then reconciling
>> the differences with perf record -b later.
>
> I think what Frederic said is that the code de-duplication of 'perf
> report' side. The branch stack and --cumulate are different - branch
> stack concentrates on the branch itself but --cumulate uses callchains
> to find parents and give some credit to them as side information.

Me too. I brought it up with Stephane at some point in the last year or
so and there wasn't an obvious way to de-duplicate because of these
differences.

-Arun

2013-10-29 05:25:44

by Namhyung Kim

[permalink] [raw]
Subject: Re: State of "perf: Add a new sort order: SORT_INCLUSIVE"

On Mon, 28 Oct 2013 21:10:38 -0700, Arun Sharma wrote:
> On 10/28/13 8:11 PM, Namhyung Kim wrote:
>
> Hey Namhyung:
>
>>>
>>> Also, what's the reasoning for --cumulate not being an option under
>>> perf record -g ..,<order>?
>>
>> Sorry, I cannot understand you. The 'perf record' just saves sample
>> data (and callchains) from the ring-buffer. All the processing happens
>> in 'perf report'. I can't see what you expect from the 'perf record
>> --cumulate'. Am I missing something?
>
> Yes - I meant to say perf report -g :)

:)

>
>> -g [type,min[,limit],order]
>
> Specifically, along with callee, caller, we could have a third
> option. Or we could have a new type (graph, fractal, cumulative).

That's also fine by me. But I added --cumulate since it's quite
different from other callchain behaviors.

If we go with -g option, I'd like add it as a new type.

>
>>> Given that there are clear use cases in production involving complex
>>> callgraphs, I'm for getting this support in first and then reconciling
>>> the differences with perf record -b later.
>>
>> I think what Frederic said is that the code de-duplication of 'perf
>> report' side. The branch stack and --cumulate are different - branch
>> stack concentrates on the branch itself but --cumulate uses callchains
>> to find parents and give some credit to them as side information.
>
> Me too. I brought it up with Stephane at some point in the last year
> or so and there wasn't an obvious way to de-duplicate because of these
> differences.

Yeah, looking at the code, I can hardly find how I can do it. :-/

Thanks,
Namhyung

2013-10-29 08:37:10

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: State of "perf: Add a new sort order: SORT_INCLUSIVE"

On Mon, Oct 28, 2013 at 09:10:38PM -0700, Arun Sharma wrote:
> On 10/28/13 8:11 PM, Namhyung Kim wrote:
>
> Hey Namhyung:
>
> >>
> >>Also, what's the reasoning for --cumulate not being an option under
> >>perf record -g ..,<order>?
> >
> >Sorry, I cannot understand you. The 'perf record' just saves sample
> >data (and callchains) from the ring-buffer. All the processing happens
> >in 'perf report'. I can't see what you expect from the 'perf record
> >--cumulate'. Am I missing something?
>
> Yes - I meant to say perf report -g :)
>
> > -g [type,min[,limit],order]
>
> Specifically, along with callee, caller, we could have a third
> option. Or we could have a new type (graph, fractal, cumulative).
>
> >>Given that there are clear use cases in production involving complex
> >>callgraphs, I'm for getting this support in first and then reconciling
> >>the differences with perf record -b later.
> >
> >I think what Frederic said is that the code de-duplication of 'perf
> >report' side. The branch stack and --cumulate are different - branch
> >stack concentrates on the branch itself but --cumulate uses callchains
> >to find parents and give some credit to them as side information.
>
> Me too. I brought it up with Stephane at some point in the last year
> or so and there wasn't an obvious way to de-duplicate because of
> these differences.

I agree that the interface is debatable. It could be -g ...,cumulative, expand -b, or whatever.
But the backend is the same: perf_report__add_branch_hist_entry should be shared 80%.