2014-01-14 15:38:26

by Frederic Weisbecker

[permalink] [raw]
Subject: perf tools: Random cleanups

Hi,

Just a bunch of non critical cleanups for comm and callchains. Based on latest acme:perf/core

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
perf/core-cleanups-for-acme

Thanks,
Frederic
---

Frederic Weisbecker (3):
perf tools: Do proper comm override error handling
perf tools: Spare double comparison of callchain first entry
perf tools: Remove unnecessary callchain cursor state restore on unmatch


tools/perf/util/callchain.c | 23 ++++++++++-------------
tools/perf/util/comm.c | 19 ++++++++++---------
tools/perf/util/comm.h | 2 +-
tools/perf/util/thread.c | 5 ++++-
4 files changed, 25 insertions(+), 24 deletions(-)


2014-01-14 15:37:35

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 2/3] perf tools: Spare double comparison of callchain first entry

When a new callchain child branch matches an existing one in the rbtree,
the comparison of its first entry is performed twice:

1) From append_chain_children() on branch lookup

2) If 1) reports a match, append_chain() then compares all entries of
the new branch against the matching node in the rbtree, and this
comparison includes the first entry of the new branch again.

Lets shortcut this by performing the whole comparison only from
append_chain() which then returns the result of the comparison between
the first entry of the new branch and the iterating node in the rbtree.
If the first entry matches, the lookup on the current level of siblings
stops and propagates to the children of the matching nodes.

This results in less comparisons performed by the CPU.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
---
tools/perf/util/callchain.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index e3970e3..e5ed16d 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -15,6 +15,8 @@
#include <errno.h>
#include <math.h>

+#include "asm/bug.h"
+
#include "hist.h"
#include "util.h"
#include "callchain.h"
@@ -356,19 +358,14 @@ append_chain_children(struct callchain_node *root,
/* lookup in childrens */
while (*p) {
s64 ret;
- struct callchain_list *cnode;

parent = *p;
rnode = rb_entry(parent, struct callchain_node, rb_node_in);
- cnode = list_first_entry(&rnode->val, struct callchain_list,
- list);

- /* just check first entry */
- ret = match_chain(node, cnode);
- if (ret == 0) {
- append_chain(rnode, cursor, period);
+ /* If at least first entry matches, rely to children */
+ ret = append_chain(rnode, cursor, period);
+ if (ret == 0)
goto inc_children_hit;
- }

if (ret < 0)
p = &parent->rb_left;
@@ -394,6 +391,7 @@ append_chain(struct callchain_node *root,
u64 start = cursor->pos;
bool found = false;
u64 matches;
+ int cmp = 0;

/*
* Lookup in the current node
@@ -408,7 +406,8 @@ append_chain(struct callchain_node *root,
if (!node)
break;

- if (match_chain(node, cnode) != 0)
+ cmp = match_chain(node, cnode);
+ if (cmp)
break;

found = true;
@@ -418,9 +417,10 @@ append_chain(struct callchain_node *root,

/* matches not, relay no the parent */
if (!found) {
+ WARN_ONCE(!cmp, "Chain comparison error\n");
cursor->curr = curr_snap;
cursor->pos = start;
- return -1;
+ return cmp;
}

matches = cursor->pos - start;
--
1.8.3.1

2014-01-14 15:37:39

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 1/3] perf tools: Do proper comm override error handling

The comm overriding API ignores memory allocation failures by silently
keeping the previous and out of date comm.

As a result, the user may get buggy events without ever being notified
about the problem and its source.

Lets start to fix this by propagating the error from the API. Not all
callers may be doing proper error handling on comm set yet but this
is the first step toward it.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
---
tools/perf/util/comm.c | 19 ++++++++++---------
tools/perf/util/comm.h | 2 +-
tools/perf/util/thread.c | 5 ++++-
3 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/tools/perf/util/comm.c b/tools/perf/util/comm.c
index 67d1e40..f9e7776 100644
--- a/tools/perf/util/comm.c
+++ b/tools/perf/util/comm.c
@@ -94,19 +94,20 @@ struct comm *comm__new(const char *str, u64 timestamp)
return comm;
}

-void comm__override(struct comm *comm, const char *str, u64 timestamp)
+int comm__override(struct comm *comm, const char *str, u64 timestamp)
{
- struct comm_str *old = comm->comm_str;
+ struct comm_str *new, *old = comm->comm_str;

- comm->comm_str = comm_str__findnew(str, &comm_str_root);
- if (!comm->comm_str) {
- comm->comm_str = old;
- return;
- }
+ new = comm_str__findnew(str, &comm_str_root);
+ if (!new)
+ return -ENOMEM;

- comm->start = timestamp;
- comm_str__get(comm->comm_str);
+ comm_str__get(new);
comm_str__put(old);
+ comm->comm_str = new;
+ comm->start = timestamp;
+
+ return 0;
}

void comm__free(struct comm *comm)
diff --git a/tools/perf/util/comm.h b/tools/perf/util/comm.h
index 7a86e56..fac5bd5 100644
--- a/tools/perf/util/comm.h
+++ b/tools/perf/util/comm.h
@@ -16,6 +16,6 @@ struct comm {
void comm__free(struct comm *comm);
struct comm *comm__new(const char *str, u64 timestamp);
const char *comm__str(const struct comm *comm);
-void comm__override(struct comm *comm, const char *str, u64 timestamp);
+int comm__override(struct comm *comm, const char *str, u64 timestamp);

#endif /* __PERF_COMM_H */
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index e394861..0358882 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -66,10 +66,13 @@ struct comm *thread__comm(const struct thread *thread)
int thread__set_comm(struct thread *thread, const char *str, u64 timestamp)
{
struct comm *new, *curr = thread__comm(thread);
+ int err;

/* Override latest entry if it had no specific time coverage */
if (!curr->start) {
- comm__override(curr, str, timestamp);
+ err = comm__override(curr, str, timestamp);
+ if (err)
+ return err;
} else {
new = comm__new(str, timestamp);
if (!new)
--
1.8.3.1

2014-01-14 15:38:03

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 3/3] perf tools: Remove unnecessary callchain cursor state restore on unmatch

If a new callchain branch doesn't match a single entry of the node that
it is given against comparison in append_chain(), then the cursor is
expected to be at the same position as it was before the comparison loop.

As such, there is no need to restore the cursor position on exit in case
of non matching branches.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
---
tools/perf/util/callchain.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index e5ed16d..85a54ad 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -386,7 +386,6 @@ append_chain(struct callchain_node *root,
struct callchain_cursor *cursor,
u64 period)
{
- struct callchain_cursor_node *curr_snap = cursor->curr;
struct callchain_list *cnode;
u64 start = cursor->pos;
bool found = false;
@@ -418,8 +417,6 @@ append_chain(struct callchain_node *root,
/* matches not, relay no the parent */
if (!found) {
WARN_ONCE(!cmp, "Chain comparison error\n");
- cursor->curr = curr_snap;
- cursor->pos = start;
return cmp;
}

--
1.8.3.1

2014-01-15 05:54:31

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 1/3] perf tools: Do proper comm override error handling

Hi Frederic,

On Tue, 14 Jan 2014 16:37:14 +0100, Frederic Weisbecker wrote:
> The comm overriding API ignores memory allocation failures by silently
> keeping the previous and out of date comm.
>
> As a result, the user may get buggy events without ever being notified
> about the problem and its source.
>
> Lets start to fix this by propagating the error from the API. Not all
> callers may be doing proper error handling on comm set yet but this
> is the first step toward it.

Acked-by: Namhyung Kim <[email protected]>

Thanks,
Namhyung

2014-01-15 06:24:02

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 2/3] perf tools: Spare double comparison of callchain first entry

On Tue, 14 Jan 2014 16:37:15 +0100, Frederic Weisbecker wrote:
> When a new callchain child branch matches an existing one in the rbtree,
> the comparison of its first entry is performed twice:
>
> 1) From append_chain_children() on branch lookup
>
> 2) If 1) reports a match, append_chain() then compares all entries of
> the new branch against the matching node in the rbtree, and this
> comparison includes the first entry of the new branch again.

Right.

>
> Lets shortcut this by performing the whole comparison only from
> append_chain() which then returns the result of the comparison between
> the first entry of the new branch and the iterating node in the rbtree.
> If the first entry matches, the lookup on the current level of siblings
> stops and propagates to the children of the matching nodes.

Hmm.. it looks like that I thought directly calling append_chain() has
some overhead - but it's not.

>
> This results in less comparisons performed by the CPU.

Do you have any numbers? I suspect it'd not be a big change, but just
curious.

>
> Signed-off-by: Frederic Weisbecker <[email protected]>

Reviewed-by: Namhyung Kim <[email protected]>

Thanks,
Namhyung

2014-01-15 06:24:51

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 3/3] perf tools: Remove unnecessary callchain cursor state restore on unmatch

On Tue, 14 Jan 2014 16:37:16 +0100, Frederic Weisbecker wrote:
> If a new callchain branch doesn't match a single entry of the node that
> it is given against comparison in append_chain(), then the cursor is
> expected to be at the same position as it was before the comparison loop.
>
> As such, there is no need to restore the cursor position on exit in case
> of non matching branches.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>

Reviewed-by: Namhyung Kim <[email protected]>

Thanks,
Namhyung

2014-01-15 16:59:36

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 2/3] perf tools: Spare double comparison of callchain first entry

On Wed, Jan 15, 2014 at 03:23:46PM +0900, Namhyung Kim wrote:
> On Tue, 14 Jan 2014 16:37:15 +0100, Frederic Weisbecker wrote:
> > When a new callchain child branch matches an existing one in the rbtree,
> > the comparison of its first entry is performed twice:
> >
> > 1) From append_chain_children() on branch lookup
> >
> > 2) If 1) reports a match, append_chain() then compares all entries of
> > the new branch against the matching node in the rbtree, and this
> > comparison includes the first entry of the new branch again.
>
> Right.
>
> >
> > Lets shortcut this by performing the whole comparison only from
> > append_chain() which then returns the result of the comparison between
> > the first entry of the new branch and the iterating node in the rbtree.
> > If the first entry matches, the lookup on the current level of siblings
> > stops and propagates to the children of the matching nodes.
>
> Hmm.. it looks like that I thought directly calling append_chain() has
> some overhead - but it's not.

No that's a right concern. I worried as well because I wasn't sure if there
is more match than unmatch on the first entry. I'd tend to think that the first
entry endures unmatches most often, in which case calling match_chain() first
may be more efficient as a fast path (ie: calling append_chain() involves
one more function call and a few other details).

But eventually measurement hasn't shown significant difference before and
after the patch.

>
> >
> > This results in less comparisons performed by the CPU.
>
> Do you have any numbers? I suspect it'd not be a big change, but just
> curious.

So I compared before/after the patchset (which include the cursor restore removal)
with:

1) Some big hackbench-like load that generates > 200 MB perf.data

perf record -g -- perf bench sched messaging -l $SOME_BIG_NUMBER

2) Compare before/after with the following reports:

perf stat perf report --stdio > /dev/null
perf stat perf report --stdio -s sym > /dev/null
perf stat perf report --stdio -G > /dev/null
perf stat perf report --stdio -g fractal,0.5,caller,address > /dev/null

And most of the time I had < 0.01% difference on time completion in favour of the patchset
(which may be due to the removed cursor restore patch eventually).

So, all in one, there was no real interesting difference. If you want the true results I can definetly relaunch the tests.

> >
> > Signed-off-by: Frederic Weisbecker <[email protected]>
>
> Reviewed-by: Namhyung Kim <[email protected]>

Thanks!

>
> Thanks,
> Namhyung

2014-01-16 01:18:01

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 2/3] perf tools: Spare double comparison of callchain first entry

Hi Frederic,

On Wed, 15 Jan 2014 17:59:30 +0100, Frederic Weisbecker wrote:
> On Wed, Jan 15, 2014 at 03:23:46PM +0900, Namhyung Kim wrote:
>> On Tue, 14 Jan 2014 16:37:15 +0100, Frederic Weisbecker wrote:
>> > When a new callchain child branch matches an existing one in the rbtree,
>> > the comparison of its first entry is performed twice:
>> >
>> > 1) From append_chain_children() on branch lookup
>> >
>> > 2) If 1) reports a match, append_chain() then compares all entries of
>> > the new branch against the matching node in the rbtree, and this
>> > comparison includes the first entry of the new branch again.
>>
>> Right.
>>
>> >
>> > Lets shortcut this by performing the whole comparison only from
>> > append_chain() which then returns the result of the comparison between
>> > the first entry of the new branch and the iterating node in the rbtree.
>> > If the first entry matches, the lookup on the current level of siblings
>> > stops and propagates to the children of the matching nodes.
>>
>> Hmm.. it looks like that I thought directly calling append_chain() has
>> some overhead - but it's not.
>
> No that's a right concern. I worried as well because I wasn't sure if there
> is more match than unmatch on the first entry. I'd tend to think that the first
> entry endures unmatches most often, in which case calling match_chain() first
> may be more efficient as a fast path (ie: calling append_chain() involves
> one more function call and a few other details).
>
> But eventually measurement hasn't shown significant difference before and
> after the patch.

I think if the sort key doesn't contain "symbol", unmatch case would be
increased as more various callchains would go into a same entry.

>
>>
>> >
>> > This results in less comparisons performed by the CPU.
>>
>> Do you have any numbers? I suspect it'd not be a big change, but just
>> curious.
>
> So I compared before/after the patchset (which include the cursor restore removal)
> with:
>
> 1) Some big hackbench-like load that generates > 200 MB perf.data
>
> perf record -g -- perf bench sched messaging -l $SOME_BIG_NUMBER
>
> 2) Compare before/after with the following reports:
>
> perf stat perf report --stdio > /dev/null
> perf stat perf report --stdio -s sym > /dev/null
> perf stat perf report --stdio -G > /dev/null
> perf stat perf report --stdio -g fractal,0.5,caller,address > /dev/null
>
> And most of the time I had < 0.01% difference on time completion in favour of the patchset
> (which may be due to the removed cursor restore patch eventually).
>
> So, all in one, there was no real interesting difference. If you want the true results I can definetly relaunch the tests.

So as an extreme case, could you please also test "-s cpu" case and
share the numbers?

Thanks,
Namhyung

2014-01-16 17:35:07

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 2/3] perf tools: Spare double comparison of callchain first entry

On Thu, Jan 16, 2014 at 10:17:53AM +0900, Namhyung Kim wrote:
> I think if the sort key doesn't contain "symbol", unmatch case would be
> increased as more various callchains would go into a same entry.

You mean -g fractal,0.5,callee,address ?

Hmm, actually I haven't seen much difference there.

> >
> >>
> >> >
> >> > This results in less comparisons performed by the CPU.
> >>
> >> Do you have any numbers? I suspect it'd not be a big change, but just
> >> curious.
> >
> > So I compared before/after the patchset (which include the cursor restore removal)
> > with:
> >
> > 1) Some big hackbench-like load that generates > 200 MB perf.data
> >
> > perf record -g -- perf bench sched messaging -l $SOME_BIG_NUMBER
> >
> > 2) Compare before/after with the following reports:
> >
> > perf stat perf report --stdio > /dev/null
> > perf stat perf report --stdio -s sym > /dev/null
> > perf stat perf report --stdio -G > /dev/null
> > perf stat perf report --stdio -g fractal,0.5,caller,address > /dev/null
> >
> > And most of the time I had < 0.01% difference on time completion in favour of the patchset
> > (which may be due to the removed cursor restore patch eventually).
> >
> > So, all in one, there was no real interesting difference. If you want the true results I can definetly relaunch the tests.
>
> So as an extreme case, could you please also test "-s cpu" case and
> share the numbers?

There is indeed a tiny difference here.

Before the patchset:

fweisbec@Aivars:~/linux-2.6-tip/tools/perf$ sudo ./perf stat -r 20 ./perf report --stdio -s cpu > /dev/null

Performance counter stats for './perf report --stdio -s cpu' (20 runs):

3343,047232 task-clock (msec) # 0,999 CPUs utilized ( +- 0,12% )
6 context-switches # 0,002 K/sec ( +- 3,82% )
0 cpu-migrations # 0,000 K/sec
128 076 page-faults # 0,038 M/sec ( +- 0,00% )
13 044 840 323 cycles # 3,902 GHz ( +- 0,12% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
16 341 506 514 instructions # 1,25 insns per cycle ( +- 0,00% )
4 042 448 707 branches # 1209,211 M/sec ( +- 0,00% )
26 819 441 branch-misses # 0,66% of all branches ( +- 0,09% )

3,345286450 seconds time elapsed ( +- 0,12% )

After the patchset:

fweisbec@Aivars:~/linux-2.6-tip/tools/perf$ sudo ./perf stat -r 20 ./perf report --stdio -s cpu > /dev/null

Performance counter stats for './perf report --stdio -s cpu' (20 runs):

3365,739972 task-clock (msec) # 0,999 CPUs utilized ( +- 0,12% )
6 context-switches # 0,002 K/sec ( +- 2,99% )
0 cpu-migrations # 0,000 K/sec
128 076 page-faults # 0,038 M/sec ( +- 0,00% )
13 133 593 870 cycles # 3,902 GHz ( +- 0,12% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
16 626 286 378 instructions # 1,27 insns per cycle ( +- 0,00% )
4 119 555 502 branches # 1223,967 M/sec ( +- 0,00% )
28 687 283 branch-misses # 0,70% of all branches ( +- 0,09% )

3,367984867 seconds time elapsed ( +- 0,12% )


Which makes about 0.6% difference on the overhead.
Now it had less overhead in common cases (default sorting, -s sym, -G, etc...).
I guess it's not really worrysome, it's mostly unvisible at this scale.

2014-01-16 19:47:48

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 2/3] perf tools: Spare double comparison of callchain first entry

Em Thu, Jan 16, 2014 at 06:34:58PM +0100, Frederic Weisbecker escreveu:
> On Thu, Jan 16, 2014 at 10:17:53AM +0900, Namhyung Kim wrote:
> > I think if the sort key doesn't contain "symbol", unmatch case would be
> > increased as more various callchains would go into a same entry.
>
> You mean -g fractal,0.5,callee,address ?
>
> Hmm, actually I haven't seen much difference there.

I guess he will, but will wait for Namhyung's final ack here, ok?

- Arnaldo

> > >
> > >>
> > >> >
> > >> > This results in less comparisons performed by the CPU.
> > >>
> > >> Do you have any numbers? I suspect it'd not be a big change, but just
> > >> curious.
> > >
> > > So I compared before/after the patchset (which include the cursor restore removal)
> > > with:
> > >
> > > 1) Some big hackbench-like load that generates > 200 MB perf.data
> > >
> > > perf record -g -- perf bench sched messaging -l $SOME_BIG_NUMBER
> > >
> > > 2) Compare before/after with the following reports:
> > >
> > > perf stat perf report --stdio > /dev/null
> > > perf stat perf report --stdio -s sym > /dev/null
> > > perf stat perf report --stdio -G > /dev/null
> > > perf stat perf report --stdio -g fractal,0.5,caller,address > /dev/null
> > >
> > > And most of the time I had < 0.01% difference on time completion in favour of the patchset
> > > (which may be due to the removed cursor restore patch eventually).
> > >
> > > So, all in one, there was no real interesting difference. If you want the true results I can definetly relaunch the tests.
> >
> > So as an extreme case, could you please also test "-s cpu" case and
> > share the numbers?
>
> There is indeed a tiny difference here.
>
> Before the patchset:
>
> fweisbec@Aivars:~/linux-2.6-tip/tools/perf$ sudo ./perf stat -r 20 ./perf report --stdio -s cpu > /dev/null
>
> Performance counter stats for './perf report --stdio -s cpu' (20 runs):
>
> 3343,047232 task-clock (msec) # 0,999 CPUs utilized ( +- 0,12% )
> 6 context-switches # 0,002 K/sec ( +- 3,82% )
> 0 cpu-migrations # 0,000 K/sec
> 128 076 page-faults # 0,038 M/sec ( +- 0,00% )
> 13 044 840 323 cycles # 3,902 GHz ( +- 0,12% )
> <not supported> stalled-cycles-frontend
> <not supported> stalled-cycles-backend
> 16 341 506 514 instructions # 1,25 insns per cycle ( +- 0,00% )
> 4 042 448 707 branches # 1209,211 M/sec ( +- 0,00% )
> 26 819 441 branch-misses # 0,66% of all branches ( +- 0,09% )
>
> 3,345286450 seconds time elapsed ( +- 0,12% )
>
> After the patchset:
>
> fweisbec@Aivars:~/linux-2.6-tip/tools/perf$ sudo ./perf stat -r 20 ./perf report --stdio -s cpu > /dev/null
>
> Performance counter stats for './perf report --stdio -s cpu' (20 runs):
>
> 3365,739972 task-clock (msec) # 0,999 CPUs utilized ( +- 0,12% )
> 6 context-switches # 0,002 K/sec ( +- 2,99% )
> 0 cpu-migrations # 0,000 K/sec
> 128 076 page-faults # 0,038 M/sec ( +- 0,00% )
> 13 133 593 870 cycles # 3,902 GHz ( +- 0,12% )
> <not supported> stalled-cycles-frontend
> <not supported> stalled-cycles-backend
> 16 626 286 378 instructions # 1,27 insns per cycle ( +- 0,00% )
> 4 119 555 502 branches # 1223,967 M/sec ( +- 0,00% )
> 28 687 283 branch-misses # 0,70% of all branches ( +- 0,09% )
>
> 3,367984867 seconds time elapsed ( +- 0,12% )
>
>
> Which makes about 0.6% difference on the overhead.
> Now it had less overhead in common cases (default sorting, -s sym, -G, etc...).
> I guess it's not really worrysome, it's mostly unvisible at this scale.

2014-01-17 07:56:23

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 2/3] perf tools: Spare double comparison of callchain first entry

Hi Arnaldo and Frederic,

On Thu, 16 Jan 2014 17:47:34 -0200, Arnaldo Carvalho de Melo wrote:
> Em Thu, Jan 16, 2014 at 06:34:58PM +0100, Frederic Weisbecker escreveu:
>> On Thu, Jan 16, 2014 at 10:17:53AM +0900, Namhyung Kim wrote:
>> > I think if the sort key doesn't contain "symbol", unmatch case would be
>> > increased as more various callchains would go into a same entry.
>>
>> You mean -g fractal,0.5,callee,address ?
>>
>> Hmm, actually I haven't seen much difference there.
>
> I guess he will, but will wait for Namhyung's final ack here, ok?

I meant sorting of hist entry like "-s cpu". So as your number said
it's not an issue IMHO. Arnaldo, I'm good with this change please merge
it. :)

Thanks,
Namhyung

2014-01-17 16:07:47

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 2/3] perf tools: Spare double comparison of callchain first entry

On Fri, Jan 17, 2014 at 04:56:18PM +0900, Namhyung Kim wrote:
> Hi Arnaldo and Frederic,
>
> On Thu, 16 Jan 2014 17:47:34 -0200, Arnaldo Carvalho de Melo wrote:
> > Em Thu, Jan 16, 2014 at 06:34:58PM +0100, Frederic Weisbecker escreveu:
> >> On Thu, Jan 16, 2014 at 10:17:53AM +0900, Namhyung Kim wrote:
> >> > I think if the sort key doesn't contain "symbol", unmatch case would be
> >> > increased as more various callchains would go into a same entry.
> >>
> >> You mean -g fractal,0.5,callee,address ?
> >>
> >> Hmm, actually I haven't seen much difference there.
> >
> > I guess he will, but will wait for Namhyung's final ack here, ok?
>
> I meant sorting of hist entry like "-s cpu". So as your number said
> it's not an issue IMHO. Arnaldo, I'm good with this change please merge
> it. :)

Thanks a lot guys! :)

Subject: [tip:perf/core] perf tools: Do proper comm override error handling

Commit-ID: 3178f58b989430fd0721df97bf21cf1c0e8cc419
Gitweb: http://git.kernel.org/tip/3178f58b989430fd0721df97bf21cf1c0e8cc419
Author: Frederic Weisbecker <[email protected]>
AuthorDate: Tue, 14 Jan 2014 16:37:14 +0100
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Thu, 16 Jan 2014 16:44:39 -0300

perf tools: Do proper comm override error handling

The comm overriding API ignores memory allocation failures by silently
keeping the previous and out of date comm.

As a result, the user may get buggy events without ever being notified
about the problem and its source.

Lets start to fix this by propagating the error from the API. Not all
callers may be doing proper error handling on comm set yet but this is
the first step toward it.

Signed-off-by: Frederic Weisbecker <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/comm.c | 19 ++++++++++---------
tools/perf/util/comm.h | 2 +-
tools/perf/util/thread.c | 5 ++++-
3 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/tools/perf/util/comm.c b/tools/perf/util/comm.c
index 67d1e40..f9e7776 100644
--- a/tools/perf/util/comm.c
+++ b/tools/perf/util/comm.c
@@ -94,19 +94,20 @@ struct comm *comm__new(const char *str, u64 timestamp)
return comm;
}

-void comm__override(struct comm *comm, const char *str, u64 timestamp)
+int comm__override(struct comm *comm, const char *str, u64 timestamp)
{
- struct comm_str *old = comm->comm_str;
+ struct comm_str *new, *old = comm->comm_str;

- comm->comm_str = comm_str__findnew(str, &comm_str_root);
- if (!comm->comm_str) {
- comm->comm_str = old;
- return;
- }
+ new = comm_str__findnew(str, &comm_str_root);
+ if (!new)
+ return -ENOMEM;

- comm->start = timestamp;
- comm_str__get(comm->comm_str);
+ comm_str__get(new);
comm_str__put(old);
+ comm->comm_str = new;
+ comm->start = timestamp;
+
+ return 0;
}

void comm__free(struct comm *comm)
diff --git a/tools/perf/util/comm.h b/tools/perf/util/comm.h
index 7a86e56..fac5bd5 100644
--- a/tools/perf/util/comm.h
+++ b/tools/perf/util/comm.h
@@ -16,6 +16,6 @@ struct comm {
void comm__free(struct comm *comm);
struct comm *comm__new(const char *str, u64 timestamp);
const char *comm__str(const struct comm *comm);
-void comm__override(struct comm *comm, const char *str, u64 timestamp);
+int comm__override(struct comm *comm, const char *str, u64 timestamp);

#endif /* __PERF_COMM_H */
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index e394861..0358882 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -66,10 +66,13 @@ struct comm *thread__comm(const struct thread *thread)
int thread__set_comm(struct thread *thread, const char *str, u64 timestamp)
{
struct comm *new, *curr = thread__comm(thread);
+ int err;

/* Override latest entry if it had no specific time coverage */
if (!curr->start) {
- comm__override(curr, str, timestamp);
+ err = comm__override(curr, str, timestamp);
+ if (err)
+ return err;
} else {
new = comm__new(str, timestamp);
if (!new)

Subject: [tip:perf/core] perf tools: Remove unnecessary callchain cursor state restore on unmatch

Commit-ID: 2a29190c040c0b11e39197c67abf6f87e0a61f9a
Gitweb: http://git.kernel.org/tip/2a29190c040c0b11e39197c67abf6f87e0a61f9a
Author: Frederic Weisbecker <[email protected]>
AuthorDate: Tue, 14 Jan 2014 16:37:16 +0100
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Fri, 17 Jan 2014 11:25:24 -0300

perf tools: Remove unnecessary callchain cursor state restore on unmatch

If a new callchain branch doesn't match a single entry of the node that
it is given against comparison in append_chain(), then the cursor is
expected to be at the same position as it was before the comparison
loop.

As such, there is no need to restore the cursor position on exit in case
of non matching branches.

Signed-off-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/callchain.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 662867d..8d9db45 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -388,7 +388,6 @@ append_chain(struct callchain_node *root,
struct callchain_cursor *cursor,
u64 period)
{
- struct callchain_cursor_node *curr_snap = cursor->curr;
struct callchain_list *cnode;
u64 start = cursor->pos;
bool found = false;
@@ -420,8 +419,6 @@ append_chain(struct callchain_node *root,
/* matches not, relay no the parent */
if (!found) {
WARN_ONCE(!cmp, "Chain comparison error\n");
- cursor->curr = curr_snap;
- cursor->pos = start;
return cmp;
}

Subject: [tip:perf/core] perf callchain: Spare double comparison of callchain first entry

Commit-ID: b965bb41061ad8d3eafda6e7feef89279fcd3916
Gitweb: http://git.kernel.org/tip/b965bb41061ad8d3eafda6e7feef89279fcd3916
Author: Frederic Weisbecker <[email protected]>
AuthorDate: Tue, 14 Jan 2014 16:37:15 +0100
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Fri, 17 Jan 2014 11:11:01 -0300

perf callchain: Spare double comparison of callchain first entry

When a new callchain child branch matches an existing one in the rbtree,
the comparison of its first entry is performed twice:

1) From append_chain_children() on branch lookup

2) If 1) reports a match, append_chain() then compares all entries of
the new branch against the matching node in the rbtree, and this
comparison includes the first entry of the new branch again.

Lets shortcut this by performing the whole comparison only from
append_chain() which then returns the result of the comparison between
the first entry of the new branch and the iterating node in the rbtree.
If the first entry matches, the lookup on the current level of siblings
stops and propagates to the children of the matching nodes.

This results in less comparisons performed by the CPU.

Signed-off-by: Frederic Weisbecker <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/callchain.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 9eb4f57..662867d 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -15,6 +15,8 @@
#include <errno.h>
#include <math.h>

+#include "asm/bug.h"
+
#include "hist.h"
#include "util.h"
#include "sort.h"
@@ -358,19 +360,14 @@ append_chain_children(struct callchain_node *root,
/* lookup in childrens */
while (*p) {
s64 ret;
- struct callchain_list *cnode;

parent = *p;
rnode = rb_entry(parent, struct callchain_node, rb_node_in);
- cnode = list_first_entry(&rnode->val, struct callchain_list,
- list);

- /* just check first entry */
- ret = match_chain(node, cnode);
- if (ret == 0) {
- append_chain(rnode, cursor, period);
+ /* If at least first entry matches, rely to children */
+ ret = append_chain(rnode, cursor, period);
+ if (ret == 0)
goto inc_children_hit;
- }

if (ret < 0)
p = &parent->rb_left;
@@ -396,6 +393,7 @@ append_chain(struct callchain_node *root,
u64 start = cursor->pos;
bool found = false;
u64 matches;
+ int cmp = 0;

/*
* Lookup in the current node
@@ -410,7 +408,8 @@ append_chain(struct callchain_node *root,
if (!node)
break;

- if (match_chain(node, cnode) != 0)
+ cmp = match_chain(node, cnode);
+ if (cmp)
break;

found = true;
@@ -420,9 +419,10 @@ append_chain(struct callchain_node *root,

/* matches not, relay no the parent */
if (!found) {
+ WARN_ONCE(!cmp, "Chain comparison error\n");
cursor->curr = curr_snap;
cursor->pos = start;
- return -1;
+ return cmp;
}

matches = cursor->pos - start;